get metrics out of AutoMLRun based on test_data - azure-machine-learning-service

I’m using the following script to execute an AutoML run, also passing the test dataset
automl_settings = {
"n_cross_validations": 10,
"primary_metric": 'spearman_correlation',
"enable_early_stopping": True,
"max_concurrent_iterations": 10,
"max_cores_per_iteration": -1,
"experiment_timeout_hours": 1,
"featurization": 'auto',
"verbosity": logging.INFO}
automl_config = AutoMLConfig(task = 'regression',
debug_log = 'automl_errors.log',
compute_target = compute_target,
training_data = training_data,
test_data = test_data,
label_column_name = label_column_name,
model_explainability = True,
**automl_settings )

Note that the TEST DATASET SUPPORT is a feature still in PRIVATE PREVIEW. It'll probably be released as PUBLIC PREVIEW later in NOVEMBER, but until then, you need to be enrolled in the PRIVATE PREVIEW in order to see the "Test runs and metrics" in the UI. You can send me an email to cesardl at microsoft dot com and send me your AZURE SUBSCRIPTION ID to be enabled so you see it in the UI.
You can see further info on how to get started here:
https://github.com/Azure/automl-testdataset-preview
About how to use it, you need to either provide the test_Data (specific Test AML Tabular Dataset that for instance you loaded from a file os split manually previously)
or you can provide a test_size which is the % (i.e. 0.2 is 20%) to be split from the single/original dataset.
About the TEST metrics, since you can make multiple TEST runs against a single model, you need to go to the specific TEST run available under the link "Test results"
enter image description here

Looks like you also need to specify test_size parameter according to the AutoMLConfig docs for the test_data:
If this parameter or the test_size parameter are not specified then no test run will be executed automatically after model training is completed. Test data should contain both features and label column. If test_data is specified then the label_column_name parameter must be specified.
As for how to extract said metrics and predictions, I imagine they'll be associated with the AutoMLRun itself (as opposed to one of the child runs).

Related

How to update an existing model in AWS sagemaker >= 2.0

I have an XGBoost model currently in production using AWS sagemaker and making real time inferences. After a while, I would like to update the model with a newer one trained on more data and keep everything as is (e.g. same endpoint, same inference procedure, so really no changes aside from the model itself)
The current deployment procedure is the following :
from sagemaker.xgboost.model import XGBoostModel
from sagemaker.xgboost.model import XGBoostPredictor
xgboost_model = XGBoostModel(
model_data = <S3 url>,
role = <sagemaker role>,
entry_point = 'inference.py',
source_dir = 'src',
code_location = <S3 url of other dependencies>
framework_version='1.5-1',
name = model_name)
xgboost_model.deploy(
instance_type='ml.c5.large',
initial_instance_count=1,
endpoint_name = model_name)
Now that I updated the model a few weeks later, I would like to re-deploy it. I am aware that the .deploy() method creates an endpoint and an endpoint configuration so it does it all. I cannot simply re-run my script again since I would encounter an error.
In previous versions of sagemaker I could have updated the model with an extra argument passed to the .deploy() method called update_endpoint = True. In sagemaker >=2.0 this is a no-op. Now, in sagemaker >= 2.0, I need to use the predictor object as stated in the documentation. So I try the following :
predictor = XGBoostPredictor(model_name)
predictor.update_endpoint(model_name= model_name)
Which actually updates the endpoint according to a new endpoint configuration. However, I do not know what it is updating... I do not specify in the above 2 lines of code that we need to considering the new xgboost_model trained on more data... so where do I tell the update to take a more recent model?
Thank you!
Update
I believe that I need to be looking at production variants as stated in their documentation here. However, their whole tutorial is based on the amazon sdk for python (boto3) which has artifacts that are hard to manage when I have difference entry points for each model variant (e.g. different inference.py scripts).
Since I found an answer to my own question I will post it here for those who encounter the same problem.
I ended up re-coding all my deployment script using the boto3 SDK rather than the sagemaker SDK (or a mix of both as some documentation suggest).
Here's the whole script that shows how to create a sagemaker model object, an endpoint configuration and an endpoint to deploy the model on for the first time. In addition, it shows how to update the endpoint with a newer model (which was my main question)
Here's the code to do all 3 in case you want to bring your own model and update it safely in production using sagemaker :
import boto3
import time
from datetime import datetime
from sagemaker import image_uris
from fileManager import * # this is a local script for helper functions
# name of zipped model and zipped inference code
CODE_TAR = 'your_inference_code_and_other_artifacts.tar.gz'
MODEL_TAR = 'your_saved_xgboost_model.tar.gz'
# sagemaker params
smClient = boto3.client('sagemaker')
smRole = <your_sagemaker_role>
bucket = sagemaker.Session().default_bucket()
# deploy algorithm
class Deployer:
def __init__(self, modelName, deployRetrained=False):
self.modelName=modelName
self.deployRetrained = deployRetrained
self.prefix = <S3_model_path_prefix>
def deploy(self):
'''
Main method to create a sagemaker model, create an endpoint configuration and deploy the model. If deployRetrained
param is set to True, this method will update an already existing endpoint.
'''
# define model name and endpoint name to be used for model deployment/update
model_name = self.modelName + <any_suffix>
endpoint_config_name = self.modelName + '-%s' %datetime.now().strftime('%Y-%m-%d-%HH%M')
endpoint_name = self.modelName
# deploy model for the first time
if not self.deployRetrained:
print('Deploying for the first time')
# here you should copy and zip the model dependencies that you may have (such as preprocessors, inference code, config code...)
# mine were zipped into the file called CODE_TAR
# upload model and model artifacts needed for inference to S3
uploadFile(list_files=[MODEL_TAR, CODE_TAR], prefix = self.prefix)
# create sagemaker model and endpoint configuration
self.createSagemakerModel(model_name)
self.createEndpointConfig(endpoint_config_name, model_name)
# deploy model and wait while endpoint is being created
self.createEndpoint(endpoint_name, endpoint_config_name)
self.waitWhileCreating(endpoint_name)
# update model
else:
print('Updating existing model')
# upload model and model artifacts needed for inference (here the old ones are replaced)
# make sure to make a backup in S3 if you would like to keep the older models
# we replace the old ones and keep the same names to avoid having to recreate a sagemaker model with a different name for the update!
uploadFile(list_files=[MODEL_TAR, CODE_TAR], prefix = self.prefix)
# create a new endpoint config that takes the new model
self.createEndpointConfig(endpoint_config_name, model_name)
# update endpoint
self.updateEndpoint(endpoint_name, endpoint_config_name)
# wait while endpoint updates then delete outdated endpoint config once it is InService
self.waitWhileCreating(endpoint_name)
self.deleteOutdatedEndpointConfig(model_name, endpoint_config_name)
def createSagemakerModel(self, model_name):
'''
Create a new sagemaker Model object with an xgboost container and an entry point for inference using boto3 API
'''
# Retrieve that inference image (container)
docker_container = image_uris.retrieve(region=region, framework='xgboost', version='1.5-1')
# Relative S3 path to pre-trained model to create S3 model URI
model_s3_key = f'{self.prefix}/'+ MODEL_TAR
# Combine bucket name, model file name, and relate S3 path to create S3 model URI
model_url = f's3://{bucket}/{model_s3_key}'
# S3 path to the necessary inference code
code_url = f's3://{bucket}/{self.prefix}/{CODE_TAR}'
# Create a sagemaker Model object with all its artifacts
smClient.create_model(
ModelName = model_name,
ExecutionRoleArn = smRole,
PrimaryContainer = {
'Image': docker_container,
'ModelDataUrl': model_url,
'Environment': {
'SAGEMAKER_PROGRAM': 'inference.py', #inference.py is at the root of my zipped CODE_TAR
'SAGEMAKER_SUBMIT_DIRECTORY': code_url,
}
}
)
def createEndpointConfig(self, endpoint_config_name, model_name):
'''
Create an endpoint configuration (only for boto3 sdk procedure) and set production variants parameters.
Each retraining procedure will induce a new variant name based on the endpoint configuration name.
'''
smClient.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[
{
'VariantName': endpoint_config_name,
'ModelName': model_name,
'InstanceType': INSTANCE_TYPE,
'InitialInstanceCount': 1
}
]
)
def createEndpoint(self, endpoint_name, endpoint_config_name):
'''
Deploy the model to an endpoint
'''
smClient.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name)
def deleteOutdatedEndpointConfig(self, name_check, current_endpoint_config):
'''
Automatically detect and delete endpoint configurations that contain a string 'name_check'. This method can be used
after a retrain procedure to delete all previous endpoint configurations but keep the current one named 'current_endpoint_config'.
'''
# get a list of all available endpoint configurations
all_configs = smClient.list_endpoint_configs()['EndpointConfigs']
# loop over the names of endpoint configs
names_list = []
for config_dict in all_configs:
endpoint_config_name = config_dict['EndpointConfigName']
# get only endpoint configs that contain name_check in them and save names to a list
if name_check in endpoint_config_name:
names_list.append(endpoint_config_name)
# remove the current endpoint configuration from the list (we do not want to detele this one since it is live)
names_list.remove(current_endpoint_config)
for name in names_list:
try:
smClient.delete_endpoint_config(EndpointConfigName=name)
print('Deleted endpoint configuration for %s' %name)
except:
print('INFO : No endpoint configuration was found for %s' %endpoint_config_name)
def updateEndpoint(self, endpoint_name, endpoint_config_name):
'''
Update existing endpoint with a new retrained model
'''
smClient.update_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name,
RetainAllVariantProperties=True)
def waitWhileCreating(self, endpoint_name):
'''
While the endpoint is being created or updated sleep for 60 seconds.
'''
# wait while creating or updating endpoint
status = smClient.describe_endpoint(EndpointName=endpoint_name)['EndpointStatus']
print('Status: %s' %status)
while status != 'InService' and status !='Failed':
time.sleep(60)
status = smClient.describe_endpoint(EndpointName=endpoint_name)['EndpointStatus']
print('Status: %s' %status)
# in case of a deployment failure raise an error
if status == 'Failed':
raise ValueError('Endpoint failed to deploy')
if __name__=="__main__":
deployer = Deployer('MyDeployedModel', deployRetrained=True)
deployer.deploy()
Final comments :
The sagemaker documentation mentions all this but fails to state that you can provide an 'entry_point' to the create_model method as well as a 'source_dir' for inference dependencies (e.g. normalization artifacts). It can be done as seen in PrimaryContainer argument.
my fileManager.py script just contains basic functions to make tar files, upload and download to and from my S3 paths. To simplify the class, I have not included them in.
The method deleteOutdatedEndpointConfig may seem like a bit of an overkill with unnecessary loops and checks, I do so because I have multiple endpoint configurations to handle and wanted to remove the ones that weren't live AND contain the string name_check (I do not know the exact name of the configuration since there is a datetime suffix). Feel free to simplify it or remove it all together.
Hope it helps.
In your model_name you specify the name of a SageMaker Model object where you can specify the image_uri, model_data etc.

Sagemaker inference : how to load model

I have trained a BERT model on sagemaker and now I want to get it ready for making predictions, i.e, inference.
I have used pytorch to train the model and model is saved to s3 bucket after training.
Here is structure inside model.tar.gz file which is present in s3 bucket.
Now, I do not understand how can I make predictions on it. I have tried to follow many guides but still could not understand.
Here is something which I have tried:
inference_image_uri = sagemaker.image_uris.retrieve(
framework='pytorch',
version='1.7.1',
instance_type=inference_instance_type,
region=aws_region,
py_version='py3',
image_scope='inference'
)
sm.create_model(
ModelName=model_name,
ExecutionRoleArn=role,
PrimaryContainer={
'ModelDataUrl': model_s3_dir,
'Image': inference_image_uri
}
)
sm.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[
{
"VariantName": "variant1", # The name of the production variant.
"ModelName": model_name,
"InstanceType": inference_instance_type, # Specify the compute instance type.
"InitialInstanceCount": 1 # Number of instances to launch initially.
}
]
)
sm.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name
)
from sagemaker.predictor import Predictor
from sagemaker.serializers import JSONLinesSerializer
from sagemaker.deserializers import JSONLinesDeserializer
inputs = [
{"inputs": ["I have a question [EOT] Hey Manish Mittal ! I'm OneAssist bot. I'm here to answer your queries. [SEP] thanks"]},
# {"features": ["OK, but not great."]},
# {"features": ["This is not the right product."]},
]
predictor = Predictor(
endpoint_name=endpoint_name,
serializer=JSONLinesSerializer(),
deserializer=JSONLinesDeserializer(),
sagemaker_session=sess
)
predicted_classes = predictor.predict(inputs)
for predicted_class in predicted_classes:
print("Predicted class {} with probability {}".format(predicted_class['predicted_label'], predicted_class['probability']))
I can see the endpoint created but while predicting, its giving me error:
ModelError: An error occurred (ModelError) when calling the
InvokeEndpoint operation: Received server error (0) from primary with
message "Your invocation timed out while waiting for a response from
container primary. Review the latency metrics for each container in
Amazon CloudWatch, resolve the issue, and try again."
I do not understand how to make it work, and also, do I need to give any entry script to the inference, if yes where.
Here's detailed documentation on deploying PyTorch models - https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#deploy-pytorch-models
If you're using the default model_fn provided by the estimator, you'll need to have the model as model.pt.
To write your own inference script and deploy the model, see the section on Bring your own model. The pytorch_model.deploy function will deploy it to a real-time endpoint, and then you can use the predictor.predict function on the resulting endpoint variable.

Add run id when registering ml.azure model via python (pipeline)

I have registed a model in this way:
from azureml.core.model import Model
model = Model.register(model_path="sklearn_regression_model.pkl",
model_name="sklearn_regression_model",
tags={'area': "diabetes", 'type': "regression"},
description="Ridge regression model to predict diabetes",
workspace=ws)
However I would like to add run id, from the experiment, so I can always back-track the model to the experiment that created the model. In azure ml there is a column indicating that it is possible to add run id to a registered model, however the model class doesn't have this parameter.
In order to see the Experiment name and the Run ID in the Azure Machine Learning Studio, I had to use Run.register_model() in the outer pipeline file instead.
It is in some way even better since at that place we get access to the Dataset objects which we can link to the model.
run = Experiment(workspace, "rgb_finetune").submit(pipeline)
run.wait_for_completion(show_output=True)
eval_metrics = run.get_metrics()["Fine-Tuned Evaluation"]
if eval_metrics["AP50"] > 0.5:
run.find_step_run("finetune.py")[0].register_model(
model_name="92c5e1a1d1",
model_path="outputs/model_export",
properties={"AP50": round(float(eval_metrics["AP50"]), 3)},
description="RGB model",
datasets=[("images", images), ("labels", labels)],
)
First of all, the Model Class has a "Run ID", which you can verify with:
azureml.core.Model.run_id this contains the ID of the Run that created the Model.
The run_id is an optional ID used to filter returned results.
So, you if you register it first, you should be able to query the run_id.
Alternatively, you can query the run_id from the Run that generated your model, and then you can register using a tag as tags={'run_id': '{your-run-id}'}

how create azure machine learning scoring image using local package

I have pkl package saved in my azure devops repository
using below code it searches for package in workspace.
How to provide package saved in repository
ws = Workspace.get(
name=workspace_name,
subscription_id=subscription_id,
resource_group=resource_group,
auth=cli_auth)
image_config = ContainerImage.image_configuration(
execution_script="score.py",
runtime="python-slim",
conda_file="conda.yml",
description="Image with ridge regression model",
tags={"area": "ml", "type": "dev"},
)
image = Image.create(
name=image_name, models=[model], image_config=image_config, workspace=ws
)
image.wait_for_creation(show_output=True)
if image.creation_state != "Succeeded":
raise Exception("Image creation status: {image.creation_state}")
print(
"{}(v.{} [{}]) stored at {} with build log {}".format(
image.name,
image.version,
image.creation_state,
image.image_location,
image.image_build_log_uri,
)
)
# Writing the image details to /aml_config/image.json
image_json = {}
image_json["image_name"] = image.name
image_json["image_version"] = image.version
image_json["image_location"] = image.image_location
with open("aml_config/image.json", "w") as outfile:
json.dump(image_json, outfile)
I tried to provide path to models but its fails saying package not found
models = $(System.DefaultWorkingDirectory)/package_model.pkl
Register model:
Register a file or folder as a model by calling Model.register().
In addition to the content of the model file itself, your registered model will also store model metadata -- model description, tags, and framework information -- that will be useful when managing and deploying models in your workspace. Using tags, for instance, you can categorize your models and apply filters when listing models in your workspace.
model = Model.register(workspace=ws,
model_name='', # Name of the registered model in your workspace.
model_path='', # Local file to upload and register as a model.
model_framework=Model.Framework.SCIKITLEARN, # Framework used to create the model.
model_framework_version=sklearn.__version__, # Version of scikit-learn used to create the model.
sample_input_dataset=input_dataset,
sample_output_dataset=output_dataset,
resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5),
description='Ridge regression model to predict diabetes progression.',
tags={'area': 'diabetes', 'type': 'regression'})
print('Name:', model.name)
print('Version:', model.version)
Deploy machine learning models to Azure: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python
To Troubleshooting remote model deployment Please follow the document.

How to use Pipeline parameters on AzureML

I've built a pipeline on AzureML Designer and I'm trying to use pipeline parameters but I'm not able to get the values of those parameters on a python script module.
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline
This documentation contains a section called "Use pipeline parameters for arguments that change at inference time" but, unfortunately, it is empty.
I'm defining the parameters on the pipeline setting, see the screenshot on the bottom. Does anyone know how to use the parameters when using the Designer to build the pipeline?
You can correlate each pipeline stage’s outputs w/its inputs. e.g. given the results of model evaluation we should be able to easily identify all the artifacts (model evaluation configuration, model specification, model parameters, training script, training data etc.) pertaining to said evaluation.
Azure Machine Learning Pipelines Referenced Article:
https://github.com/Azure/MachineLearningNotebooks/blob/4a3f8e7025334ea8c0de0bada69b031ce54c24a0/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb
We have an AMLS pipeline trying to parameterize with a date string to process our pipeline in the context of old historical dates.
Here’s the code we’re using to submit the pipeline
from azureml.core.authentication import InteractiveLoginAuthentication
import requests
auth = InteractiveLoginAuthentication()
aad_token = auth.get_authentication_header()
rest_endpoint = published_pipeline.endpoint
print("You can perform HTTP POST on URL {} to trigger this pipeline".format(rest_endpoint))
# specify the param when running the pipeline
response = requests.post(rest_endpoint,
headers=aad_token,
json={"ExperimentName": "dtpred-Dock2RTEG-EX-param",
"RunSource": "SDK",
"DataPathAssignments": {"input_datapath": {"DataStoreName": "erpgen2datastore","RelativePath": "teams/PredictiveInsights/DatePrediction/2019/10/10"}},
"ParameterAssignments": {"param_inputDate": "2019/10/10"}})
run_id = response.json()["Id"]
print('Submitted pipeline run: ', run_id)

Resources