Payload logging on AWS SageMaker - watson-openscale

How can I enable monitoring and log payload for a model deployed on AWS SageMaker? I am using a classification model and will be outputting predicted class and confidence. How should i configure this in UI or sdk?

The configuration process in UI:
Click second tab on the left and select AWS SageMaker
Provide the access key info and the region of the AWS SageMaker
Select the deployment(s) you want to monitor
Use the code snippet provided in a Watson Studio notebook to set up the payload schema.
Configure the fairness and accuracy monitors in the UI. This step should be the same as configuring deployments from any other environments (e.g. WML, SPSS)

SageMaker sends all logs produced by your model container to your CloudWatch, under log group named /aws/sagemaker/Endpoints/[EndpointName]. With that said, you could simply configure your model to log inference payloads and outputs, and they will show up in CloudWatch logs.

Related

How to list all the model monitoring job on vertex ai given the display name?

I have enabled the model monitoring job for my endpoint on the first pipeline run. For the first run, the first version of the model is created for which monitoring is enabled.
Now when I rerun the pipeline, a new version of the model will be created, in that case how to enable the model monitoring job for the new model version through python SDK?
Or will it be enabled automatically given that both models are deployed to the same endpoint?

How do I get an available EC2 volume using Python?

I am trying to create a Python script to monitor my multiple AWS EC2 instances and I am using the Boto3 library.
I have gotten stuck when it comes to finding an available volume, as there is the method which returns the volume Id and total size, as described in: Boto3 get EC2 instance's volume.
There is no direct way of checking the available volume in EBS. AWS provides a way to send custom metrics to Cloudwatch using the Cloudwatch Unified Agent.
Run the CloudWatch unified agent in all your EC2 instances.
Send the custom metric (disk utilization) to Cloudwatch, using the agent.
Use boto3 to check CloudWatch metric.
Check this link for more information

How to generate an alert if deployment becomes 'Unhealthy' in Azure Machine Learning?

I deployed an Azure Machine Learning model to AKS, and would like to know how to set an alert if the deployment status changes to any value other than 'Healhty'. I looked at the monitoring metrics in the workspace, but it looks like they are more related to the training process (Model and Run) and Quotas. Please let me know if you have any suggestions
Thanks!
Aazure Machine Learning does not provide a way to continuously monitor the health of your webservice and generate alerts.
You can set up this fairly easily using Application Insights(AML Workspace comes with a provisioned Application Insights).
You can monitor the webservice scoring endpoint using URL ping or web test in App Insights.

Retrain the classification model automatically based on updated data set

We have created an experiment in Azure ML Studio to predict some scheduling activities based on the system data and user data. System data consists of the CPU time, Heap Usage and other system parameters while user data has active sessions of the user and some user-specific data.
Our experiment is working fine and returning the results quite similar to what we are expecting, but we are struggling with the following:-
1) Our experiment is not considering the updated data for training its models.
2) Every time we are required to upload the data and retrain the models manually.
I wonder if it is really possible to feed in live data to the azure experiments using some web-services or by using Azure DB. We are trying to update the data in CSV file that we have created in Azure storage. That probably would solve our 1st query.
Now, this updated data should be considered to train the model periodically automatically.
It would be great if someone could help us out with it?
Note: We are using our model using the web services created with the help of Azure studio.
Step 1 : Create 2 web services with Azure ML Studio ( One for the training model and one for the predictive model)
Step 2: Create endpoint through the web service with the link Manage Endpoint on Azure ML Studio for each web service
Step 3: Create 2 new connections on Azure Data Factory / Find Azure ML (on compute tab) and copy the Endpoint key and API Key that you will find under the Consume tab in the endpoint configuration (the one that you created on step 2) Endpoint Key = Batch Requests Key and API Key = Primary Key
Set Disable Update Resource for the training model endpoint
Set Enable Update Resource for the predictive model endpoint ( Update Resource End Point = Patch key )
Step 4 : Create a pipeline with 2 activities ( ML Batch Execution and ML Update Resource)
Set the AML Linked service for the ML batch Execution with the connection that has disable Update Resource
Set the AML Linked service for the ML Update Resource with the connection that has Enable Update Resource
Step 5 : Set the Web Service Inputs and Outputs
You need to use Azure Data Factory to retrain the ML model.
You need to create a pipeline with the ML Batch Execution and ML Update Resource activities and to call your ML model you need to configure the endpoint on the webservice.
Here is some links to help you :
https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-machine-learning
https://learn.microsoft.com/en-us/azure/data-factory/update-machine-learning-models

Custom Cloudwatch Metrics

I am using AWS RDS SQL server and I need to do enhanced level monitoring via Cloudwatch. By default there are some basic monitoring available but I want use custom metrics as well.
In my scenario I need to create an alarm whenever we get more number of deadlock in SQL server. We are able to fetch the details of deadlock via script and I need to prepare custom metrics for the same.
Can any one help on this or kindly suggest any alternate solution?

Resources