Python SDK v2 for Azure Machine Learning SDK (preview) - how to retrieve workspace from WorkspaceOperations - python-3.x

I’m following this article to create ML pipelines with the new SDK.
So I started by loading the first class
from azure.ai.ml import MLClient
and then I used it to authenticated on my workspace
ml_client = MLClient(
credential=credential,
subscription_id=subscription_id,
resource_group_name=resource_group_name,
workspace_name=" mmAmlsWksp01",
)
However, I can’t understand how I can retrieve the objects it refers to. For example, it contains a “workspaces” member, but if I run
ml_client.workspaces["mmAmlsWksp01"]
, I get the error “'WorkspaceOperations' object is not subscriptable”.
So I tried to run
for w in ml_client.workspaces.list():
print(w)
and it returns the workspace details (name, displayName, id…) for a SINGLE workspace, but not the workspace object.
In fact, the ml_client.workspaces object is a
<azure.ai.ml._operations.workspace_operations.WorkspaceOperations at 0x7f3526e45d60>
, but I don’t want a WorkspaceOperation, I want the Workspace itself. How can I retrieve it?

You need to use the get method on workspaces e.g.
ml_client.workspaces.get.("mmAmlsWksp01")

Related

Azure ML - Can not list/load registered Datasets via Python SDK

I'm quite new to Azure ML and Python. I created some datasets using both the Azure ML GUI and the Python SDK:
Now I want to load these datasets in a Pandas Dataframe. But when I run
Dataset.get_all(workspace=workspace)
I got an empty list.
Do I miss something? I'm using the version 0.2.7. of azureml and Version 1.46.0. of azureml-core.
I also tried
workspace.datasets
But also got an empty result.
We need to have the datasets in the workspace in the form of tabular. If the dataset is in another format, we can’t retrieve the datasets in the workspace.
I am choosing Local files
Code:
ws = Workspace.get(name="your ws name",
subscription_id='your subscription ID',
resource_group='your resource group')
Then using the workspace details, we can connect and get dataset details.
Dataset.get_all(ws)
Output:
{'churn3': DatasetRegistration(id='b9fb5e3d-4893-4847-9f2e-497d85edd3b3', name='churn3', version=1, description='', tags={}), 'churn2': DatasetRegistration(id='007a6e1a-268c-409e-adbb-d5437946c4ef', name='churn2', version=1, description='', tags={})}
Achieved the result.
There's a way to do it.
You have to upload the files manually.
In the azure ML studio:
Data/Data Assets -> Create
Type -> File (Dataset types from Azure ML v1 APIs)
Then you call:
Dataset.get_all(ws)
It returns all datasets, both File and Tabular.

How to get reference to AzureML Workspace Class in scoring script?

My scoring function needs to refer to an Azure ML Registered Dataset for which I need a reference to the AzureML Workspace object. When including this in the init() function of the scoring script it gives the following error:
"code": "ScoreInitRestart",
"message": "Your scoring file's init() function restarts frequently. You can address the error by increasing the value of memory_gb in deployment_config."
On debugging the issue is:
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code [REDACTED] to authenticate.
How can I resolve this issue without exposing Service Principal Credentials in the scoring script?
I found a workaround to reference the workspace in the scoring script. Below is a code snippet of how one can do that -
My deploy script looks like this :
from azureml.core import Environment
from azureml.core.model import InferenceConfig
#Add python dependencies for the models
scoringenv = Environment.from_conda_specification(
name = "scoringenv",
file_path="config_files/scoring_env.yml"
)
#Create a dictionary to set-up the env variables
env_variables={'tenant_id':tenant_id,
'subscription_id':subscription_id,
'resource_group':resource_group,
'client_id':client_id,
'client_secret':client_secret
}
scoringenv.environment_variables=env_variables
# Configure the scoring environment
inference_config = InferenceConfig(
entry_script='score.py',
source_directory='scripts/',
environment=scoringenv
)
What I am doing here is creating an image with the python dependencies(in the scoring_env.yml) and passing a dictionary of the secrets as environment variables. I have the secrets stored in the key-vault.
You may define and pass native python datatype variables.
Now, In my score.py, I reference these environment variables in the init() like this -
tenant_id = os.environ.get('tenant_id')
client_id = os.environ.get('client_id')
client_secret = os.environ.get('client_secret')
subscription_id = os.environ.get('subscription_id')
resource_group = os.environ.get('resource_group')
Once you have these variables, you may create a workspace object using Service Principal authentication like #Anders Swanson mentioned in his reply.
Another way to resolve this may be by using managed identities for AKS. I did not explore that option.
Hope this helps! Please let me know if you found a better way of solving this.
Thanks!
Does your score.py include a Workspace.get() with auth=InteractiveAuthentication call? You should swap it to ServicePrincipalAuthentication (docs) to which you pass your credentials ideally through environment variables.
import os
from azureml.core.authentication import ServicePrincipalAuthentication
svc_pr_password = os.environ.get("AZUREML_PASSWORD")
svc_pr = ServicePrincipalAuthentication(
tenant_id="my-tenant-id",
service_principal_id="my-application-id",
service_principal_password=svc_pr_password)
ws = Workspace(
subscription_id="my-subscription-id",
resource_group="my-ml-rg",
workspace_name="my-ml-workspace",
auth=svc_pr
)
print("Found workspace {} at location {}".format(ws.name, ws.location))
You can get the workspace object directly from your run.
from azureml.core.run import Run
ws = Run.get_context().experiment.workspace
I came across the same challenge. As you are mentioning AML Datasets, I assume an AML Batch Endpoint is suitable to your scenario. The scoring script for a batch endpoint is meant to receive a list of files as input. When invoking the batch endpoint, you can pass (among the others) AML Datasets (consider that an endpoint is deployed in the context of an AML workspace). Have a look to this.
When Running on AML Compute CLuster use the following code
from azureml.pipeline.core import PipelineRun as Run
run = Run.get_context()
run.experiment.workspace
ws = run.experiment.workspace
Note: This works only when you run on AML Cluster
Run.get_context() gets the conext of total AML cluster from that object we can extract workspace context which allows you to authenticate AML workspace with AML cluster

How to get the Builds for a project using OData in Azure Devops

I'm trying to get all the builds for a project using Odata for Azure Devops. However I found there was an endpoint for that https://analytics.dev.azure.com/{organization}/{project}/_odata/v3.0-preview/Builds but while trying the same for my Project I'm getting the below error
{"$id":"1","innerException":null,"message":"Resource not found for the segment 'Builds'.","typeName":"Microsoft.OData.UriParser.ODataUnrecognizedPathException, Microsoft.TeamFoundation.OData.Core","typeKey":"ODataUnrecognizedPathException","errorCode":0,"eventId":0}
Is this endpoint not available anymore. Or is there something wrong with my query?
Is this endpoint not available anymore. Or is there something wrong
with my query?
The entity set Build has been renamed. You can check this post from our User Voice forum.
All entity sets and entity properties with names starting with Build will be renamed to start with PipelineRun.
For e.g. Builds entity set will be called PipelineRuns, BuildId entity property will be called PipelineRunId.
So what you should use is : https://analytics.dev.azure.com/{organization}/{project}/_odata/v3.0-preview/PipelineRuns. It works on my side :)

How to connect Google Datastore from a script in Python 3

We want to do some stuff with the data that is in the Google Datastore. We have a database already, We would like to use Python 3 to handle the data and make queries from a script on our developing machines. Which would be the easiest way to accomplish what we need?
From the Official Documentation:
You will need to install the Cloud Datastore client library for Python:
pip install --upgrade google-cloud-datastore
Set up authentication by creating a service account and setting an environment variable. It will be easier if you see it, please take a look at the official documentation for more info about this. You can perform this step by either using the GCP console or command line.
Then you will be able to connect to your Cloud Datastore client and use it, as in the example below:
# Imports the Google Cloud client library
from google.cloud import datastore
# Instantiates a client
datastore_client = datastore.Client()
# The kind for the new entity
kind = 'Task'
# The name/ID for the new entity
name = 'sampletask1'
# The Cloud Datastore key for the new entity
task_key = datastore_client.key(kind, name)
# Prepares the new entity
task = datastore.Entity(key=task_key)
task['description'] = 'Buy milk'
# Saves the entity
datastore_client.put(task)
print('Saved {}: {}'.format(task.key.name, task['description']))
As #JohnHanley mentioned, you will find a good example on this Bookshelf app tutorial that uses Cloud Datastore to store its persistent data and metadata for books.
You can create a service account and download the credentials as JSON and then set an environment variable called GOOGLE_APPLICATION_CREDENTIALS pointing to the json file. You can see the details at the link below.
https://googleapis.dev/python/google-api-core/latest/auth.html

How to update the existing web service with a new docker image on Azure Machine Learning Services?

I am currently working on machine learning project with Azure Machine Learning Services. But I found the problem that I can't update a new docker image to the existing web service (I want to same url as running we service).
I have read the documentation but it doesn't really tell me how to update (documentation link: https://learn.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where).
The documentation said that we have to use update() with image = new-image.
from azureml.core.webservice import Webservice
service_name = 'aci-mnist-3
# Retrieve existing service
service = Webservice(name = service_name, workspace = ws)
# Update the image used by the service
service.update(image = new-image)
print(service.state)
But the new-image isn't described where it comes from.
Does anyone know how to figure out this problem?
Thank you
The documentation could be a little more clear on this part, I agree. The new-image is an image object that you should pass into the update() function. If you just created the image you might already have the object in a variable, then just pass it. If not, then you can obtain it from your workspace using
from azureml.core.image.image import Image
new_image = Image(ws, image_name)
where ws is your workspace object and image_name is a string with the name of the image you want to obtain. Then you go on calling update() as
from azureml.core.webservice import Webservice
service_name = 'aci-mnist-3'
# Retrieve existing service
service = Webservice(name = service_name, workspace = ws)
# Update the image used by the service
service.update(image = new_image) # Note that dash isn't supported in variable names
print(service.state)
You can find more information in the SDK documentation
EDIT:
Both the Image and the Webservice classes above are abstract parent classes.
For the Image object, you should really use one of these classes, depending on your case:
ContainerImage
UnknownImage
(see Image package in the documentation).
For the Webservice object, you should use one of these classes, depending on your case:
AciWebservice
AksWebservice
UnknownWebservice
(see Webservice package in the documentation).

Resources