SageMaker PyTorchModel passing custom variables - pytorch

When deploying a model with SageMaker through the PyTorchModel class, is it possible to pass a custom environmental variable or kwargs?
I'd like to be able to switch the functionality of the serving code via a custom argument rather than needing to write multiple serve.py to handle different training model export methods.
model = PyTorchModel(name='my_model',
model_data=estimator.model_data,
role=role,
framework_version='1.0.0',
entry_point='serve.py',
source_dir='src',
sagemaker_session=sess,
predictor_cls=ImagePredictor,
<custom_argument?>
)

Have you tried using the env parameter in your PyTorchModel ? (cf. https://sagemaker.readthedocs.io/en/stable/model.html#sagemaker.model.Model)
model = PyTorchModel(name='my_model',
model_data=estimator.model_data,
role=role,
framework_version='1.0.0',
entry_point='serve.py',
source_dir='src',
sagemaker_session=sess,
predictor_cls=ImagePredictor,
env={'ENV_VALUE': 'val'}
)

This should work (from a trained estimator or from a model, with the high-level Python SDK)
model.deploy(
initial_instance_count=1,
instance_type='ml.m5.xlarge',
env={'MY_ENVIRONMENT_VARIABLE':'value'})

Related

Optionally use component functions added in VertexAI python SDK

I am using vertex ai's python SDK and it's built on top of Kubeflow pipelines. In it, you supposedly can do this:
train_op = (sklearn_classification_train(
train_data = data_op.outputs['train_out']
).
set_cpu_limit(training_cpu_limit).
set_memory_limit(training_memory_limit).
add_node_selector_constraint(training_node_selector).
set_gpu_limit(training_gpu_limit)
)
where you can add these functions (set_cpu_limit, set_memory_limit, add_node_selector, and set_gpu_limit) onto your component. I've haven't used this syntax before.
How I can optionally use each 'sub function' only if the variables are specified each function?
For example, if training_gpu_limit isn't set, I don't want to execute set_gpu_limit on the component.
These functions are not appended to the function, but to the component.
In your code, if you do print(type(train_op)) it will be an object type component. So there is no way to add parameters to the function which will influence behavior of the component.
You can do it in pipeline function by changing code to:
train_op = sklearn_classification_train(train_data = data_op.outputs['train_out'])
if training_cpu_limit:
train_op.set_cpu_limit(training_cpu_limit)
# etc

How to extract a dataset from azureml.core.model.Model Class?

Azure Machine Learning Service's Model Artifact has the ability to store references to the Datasets associated with the model. We can use azureml.core.model.Model.add_dataset_references([('relation-as-a-string', Dataset)]) to add these dataset references.
How do we retrieve a Dataset from the references stored in this Model class by using a reference to the Model Class?
get_by_name(workspace, name, version='latest')
Parameters
workspace
The existing AzureML workspace in which the Dataset was registered.
name
The registration name.
version
The registration version. Defaults to 'latest'.
Returns
The registered dataset object.
Consider that a Dataset was added as a reference to a Model with the name 'training_dataset'
In order to get a reference to this Dataset we use:
model = Model(workspace, name)
dataset_id = next(dictionary['id'] for dictionary in model.serialize()['datasets'] if dictionary["name"] == 'training_dataset')
dataset_reference = Dataset.get_by_id(workspace, dataset_id )
After this step we can use dataset_reference as any other AzureML Dataset Class object.

Export django model to database adding extra fields

For exportation into a dbdump, I need to create a table that is a exact clone of my model but with a "summary" column.
Given that the model is concrete, not abstract, to subclass it is a failure, as
class AnnotatedModel(MyModel):
summary = m.TextField(null=True, blank=True)
creates a new table with only the new field.
I have attempted to use metaclass inheritance instead, but I am stuck because of the model.Meta subclass of Django BaseModel. Other attemps to completely clone the model with copy deepcopy etc have been also unsuccessful. I have got some success using add_to_class but I am not sure if it is a documented user level function and it modifies deeply the class, so I have not been able to produce two different, separated models.
The goal is be to be able to run a loop, say
for x in MyModel.objects.using('input').all():
y = cast_to_AnnotatedModelInstance(x)
y.pk = None
y.summary = Foo(x)
y.save(using='output')
without modifying the original model which is in a separate package. Ideally, I would prefer x to be objects of MyModel and then casting to AnnotatedModel and save them.
At the moment, what I am doing is to expand the model with add_to_class
from foo.bar.models import MyModel
MyModel.add_to_class('summary',m.TextField(null=True, blank=True))
then create the export database explicitly
with c['output'].schema_editor() as editor:
editor.create_model(MyModel)
and then loop as in the question, with using('input').defer("summary") to access the original model of the application.
for x in MyModel.objects.using('input').defer("summary").all():
x.pk = None
x.summary = Foo(x)
x.save(using='output')
Note that because of the add_to_class, the model tries to read the column summary even in the original database, fortunately it can be skipped using defer.

Pytorch: Recover network with customized VGG model that was saved improperly

I am currently doing work with customizing the forward method for models. I was using some tutorial code that ran VGG. I did a few runs with the baseline model and it seemed to work fine. Afterwards, I replaced the forward method for the VGG using:
net.forward = types.MethodType(forward_vgg_new, net)
Unfortunately, the way that the tutorial code saves the models is:
state = {
'net':net,
'acc':acc,
'epoch':epoch,
}
...
torch.save(state, ...)
While This worked for the original tutorial code, loading no longer works for my custom models as I get:
AttributeError: 'VGG' object has no attribute 'forward_vgg_new'
I have since read from the documentation that it is better for me to save the model's state_dict:
state = {
'net':net.state_dict(),
'acc':acc,
'epoch':epoch,
}
...
torch.save(state, ...)
While I will change the code for future runs, I was wondering if it was possible to salvage the models I have already trained. I naively already tried to import the VGG class and add my forward_vgg_new method to it:
setattr(VGG, 'forward_vgg_new', forward_vgg_new)
before calling torch.load, but it doesn't work.
To solve the problem, I went directly into the VGG library and temporarily added my function so that I could load the saved models and save only their state dicts. I reverted the changes to the VGG library after I recovered the saves. Not the most graceful way of fixing the problem, but it worked.

'TensorBoard' object has no attribute 'writer' error when using Callback.on_epoch_end()

Since Model.train_on_batch() doesn't take a callback input, I tried using Callback.on_epoch_end() in order to write my loss to tensorboard
However, trying to run the on_epoch_end() method results in the titular error, 'TensorBoard' object has no attribute 'writer'. Other solutions to my original problem with writing to tensorboard included calling the Callback.writer attribute, and running these solutions gave the same error. Also, the tensorflow documentation for the TensorBoard class doesn't mention a writer attribute
I'm somewhat of a novice programmer, but it seems to me that the on_epoch_end() method is also at some point calling the writer attribute, but I'm confused as to why the function would use an attribute that doesn't exist
Here's the code I'm using to create the callback:
logdir = "./logs/"
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)
and this is the callback code that I try to run in my training loop:
logs = {
'encoder':encoder_loss[0],
'discriminator':d_loss,
'generator':g_loss,
}
tensorboard_callback.on_epoch_end(i, logs)
where encoder_loss, d_loss, and g_loss are my scalars, and i is the batch number
Is the error a result of some improper code on my part, or is tensorflow trying to reference something that doesn't exist?
Also, if anyone knows another way to write to tensorboard using Model.train_on_batch, that would also solve my problem
Since you are using a callback without the fit method, you also need to pass your model to the TensorBoard object:
logdir = "./logs/"
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=logdir)
tensorboard_callback.set_model(model=model)

Resources