How to get reference to AzureML Workspace Class in scoring script? - azure-machine-learning-service

My scoring function needs to refer to an Azure ML Registered Dataset for which I need a reference to the AzureML Workspace object. When including this in the init() function of the scoring script it gives the following error:
"code": "ScoreInitRestart",
"message": "Your scoring file's init() function restarts frequently. You can address the error by increasing the value of memory_gb in deployment_config."
On debugging the issue is:
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code [REDACTED] to authenticate.
How can I resolve this issue without exposing Service Principal Credentials in the scoring script?

I found a workaround to reference the workspace in the scoring script. Below is a code snippet of how one can do that -
My deploy script looks like this :
from azureml.core import Environment
from azureml.core.model import InferenceConfig
#Add python dependencies for the models
scoringenv = Environment.from_conda_specification(
name = "scoringenv",
file_path="config_files/scoring_env.yml"
)
#Create a dictionary to set-up the env variables
env_variables={'tenant_id':tenant_id,
'subscription_id':subscription_id,
'resource_group':resource_group,
'client_id':client_id,
'client_secret':client_secret
}
scoringenv.environment_variables=env_variables
# Configure the scoring environment
inference_config = InferenceConfig(
entry_script='score.py',
source_directory='scripts/',
environment=scoringenv
)
What I am doing here is creating an image with the python dependencies(in the scoring_env.yml) and passing a dictionary of the secrets as environment variables. I have the secrets stored in the key-vault.
You may define and pass native python datatype variables.
Now, In my score.py, I reference these environment variables in the init() like this -
tenant_id = os.environ.get('tenant_id')
client_id = os.environ.get('client_id')
client_secret = os.environ.get('client_secret')
subscription_id = os.environ.get('subscription_id')
resource_group = os.environ.get('resource_group')
Once you have these variables, you may create a workspace object using Service Principal authentication like #Anders Swanson mentioned in his reply.
Another way to resolve this may be by using managed identities for AKS. I did not explore that option.
Hope this helps! Please let me know if you found a better way of solving this.
Thanks!

Does your score.py include a Workspace.get() with auth=InteractiveAuthentication call? You should swap it to ServicePrincipalAuthentication (docs) to which you pass your credentials ideally through environment variables.
import os
from azureml.core.authentication import ServicePrincipalAuthentication
svc_pr_password = os.environ.get("AZUREML_PASSWORD")
svc_pr = ServicePrincipalAuthentication(
tenant_id="my-tenant-id",
service_principal_id="my-application-id",
service_principal_password=svc_pr_password)
ws = Workspace(
subscription_id="my-subscription-id",
resource_group="my-ml-rg",
workspace_name="my-ml-workspace",
auth=svc_pr
)
print("Found workspace {} at location {}".format(ws.name, ws.location))

You can get the workspace object directly from your run.
from azureml.core.run import Run
ws = Run.get_context().experiment.workspace

I came across the same challenge. As you are mentioning AML Datasets, I assume an AML Batch Endpoint is suitable to your scenario. The scoring script for a batch endpoint is meant to receive a list of files as input. When invoking the batch endpoint, you can pass (among the others) AML Datasets (consider that an endpoint is deployed in the context of an AML workspace). Have a look to this.

When Running on AML Compute CLuster use the following code
from azureml.pipeline.core import PipelineRun as Run
run = Run.get_context()
run.experiment.workspace
ws = run.experiment.workspace
Note: This works only when you run on AML Cluster
Run.get_context() gets the conext of total AML cluster from that object we can extract workspace context which allows you to authenticate AML workspace with AML cluster

Related

Azure-quantum authentication is near to impossible with EnvironmentCredential

The code below to solve a simple problem on azure quantum never works. Microsoft, please help me with authentication.
from azure.quantum.optimization import Problem, ProblemType, Term
from typing import List
from azure.quantum.optimization import Term
from azure.quantum import Workspace
from azure.identity import EnvironmentCredential
from azure.quantum.optimization import ParallelTempering
problem = Problem(name="My First Problem", problem_type=ProblemType.ising)
workspace = Workspace (
subscription_id = "my-subscription-id", # Add your subscription_id
resource_group = "AzureQuantum", # Add your resource_group
name = "my-workspace-name", # Add your workspace name
location = "my-workspace-location" , # Add your workspace location (for example, "westus")
credential = EnvironmentCredential(AZURE_USERNAME="my-email-id", AZURE_PASSWORD="my-microsoft-password")
# credential = ManagedIdentityCredential()
)
terms = [
Term(c=-9, indices=[0]),
Term(c=-3, indices=[1,0]),
Term(c=5, indices=[2,0]),
Term(c=9, indices=[2,1]),
Term(c=2, indices=[3,0]),
Term(c=-4, indices=[3,1]),
Term(c=4, indices=[3,2])
]
problem.add_terms(terms=terms)
solver = ParallelTempering(workspace, timeout=100)
result = solver.optimize(problem)
print(result)
The above code throws the error:
EnvironmentCredential.get_token failed: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot.this issue.
---------------------------------------------------------------------------
CredentialUnavailableError Traceback (most recent call last)
[<ipython-input-19-90cd448f8194>](https://localhost:8080/#) in <module>()
3 solver = ParallelTempering(workspace, timeout=100)
4
----> 5 result = solver.optimize(problem)
6 print(result)
19 frames
[/usr/local/lib/python3.7/dist-packages/azure/identity/_credentials/environment.py](https://localhost:8080/#) in get_token(self, *scopes, **kwargs)
141 "this issue."
142 )
--> 143 raise CredentialUnavailableError(message=message)
144 return self._credential.get_token(*scopes, **kwargs)
CredentialUnavailableError: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
Details
The above code works perfectly fine when I do not pass any credentials to the workspace, but this pops up a window for authentication. I do not want to click manually on the browser every time I run something to autheticate. I just want to pass the credentials in code with ease without having to deal with all the complicated things defined for the authentication in the docs.
Note: I'm passing my email and password in the EnvironmentCredential (I'm obviously not writing confidential info here like subscription id passed in workspace)
The EnvironmentCredential takes its parameters from environment variables -not from arguments in the constructor-, so in order to avoid getting prompted for credentials you would need to set the corresponding environment variables with the correct values, then run your program. Something like:
set AZURE_USERNAME="my-email-id"
set AZURE_PASSWORD="my-microsoft-password"
python myprogram.py
in your code you would then do something like:
workspace = Workspace (
subscription_id = "my-subscription-id", # Add your subscription_id
resource_group = "AzureQuantum", # Add your resource_group
name = "my-workspace-name", # Add your workspace name
location = "my-workspace-location" , # Add your workspace location (for example, "westus")
credential = EnvironmentCredential()
)
That being said, an easier way to not get prompted for credentials is to install the Azure CLI and login using az login; that will prompt once for credentials and then persist them locally in your machine so you don't have to login again.
Optionally, if you are using VS Code you can install the Azure Account extension.
For all these options, you don't need to provide any credential in the AzureQuantum constructor. It will automatically try to discover if you've set the environment variables, used the CLI or the extension to log in and used that. It only defaults to prompt on the browser if it can't find anything else.

Python SDK v2 for Azure Machine Learning SDK (preview) - how to retrieve workspace from WorkspaceOperations

I’m following this article to create ML pipelines with the new SDK.
So I started by loading the first class
from azure.ai.ml import MLClient
and then I used it to authenticated on my workspace
ml_client = MLClient(
credential=credential,
subscription_id=subscription_id,
resource_group_name=resource_group_name,
workspace_name=" mmAmlsWksp01",
)
However, I can’t understand how I can retrieve the objects it refers to. For example, it contains a “workspaces” member, but if I run
ml_client.workspaces["mmAmlsWksp01"]
, I get the error “'WorkspaceOperations' object is not subscriptable”.
So I tried to run
for w in ml_client.workspaces.list():
print(w)
and it returns the workspace details (name, displayName, id…) for a SINGLE workspace, but not the workspace object.
In fact, the ml_client.workspaces object is a
<azure.ai.ml._operations.workspace_operations.WorkspaceOperations at 0x7f3526e45d60>
, but I don’t want a WorkspaceOperation, I want the Workspace itself. How can I retrieve it?
You need to use the get method on workspaces e.g.
ml_client.workspaces.get.("mmAmlsWksp01")

How to pass in the model name during init in Azure Machine Learning Service?

I am deploying 50 NLP models on Azure Container Instances via the Azure Machine Learning service. All 50 models are quite similar and have the same input/output format with just the model implementation changing slightly.
I want to write a generic score.py entry file and pass in the model name as a parameter. The interface method signature does not allow a parameter in the init() method of score.py, so I moved the model loading into the run method. I am assuming the init() method gets run once whereas Run(data) will get executed on every invocation, so this is possibly not ideal (the models are 1 gig in size)
So how can I pass in some value to the init() method of my container to tell it what model to load?
Here is my current, working code:
def init():
def loadModel(model_name):
model_path = Model.get_model_path(model_name)
return fasttext.load_model(model_path)
def run(raw_data):
# extract model_name from raw_data omitted...
model = loadModel(model_name)
...
but this is what I would like to do (which breaks the interface)
def init(model_name):
model = loadModel(model_name)
def loadModel(model_name):
model_path = Model.get_model_path(model_name)
return fasttext.load_model(model_path)
def run(raw_data):
...
If you're looking to use the same deployed container and switch models between requests; it's not the preferred design choice for Azure machine learning service, we need to specify the model name to load during build/deploy.
Ideally, each deployed web-service endpoint should allow inference of one model only; with the model name defined before the container the image starts building/deploying.
It is mandatory that the entry script has both init() and run(raw_data) with those exact signatures.
At the moment, we can't change the signature of init() method to take a parameter like in init(model_name).
The only dynamic user input you'd ever get to pass into this web-service is via run(raw_data) method. As you have tried, given the size of your model passing it via run is not feasible.
init() is run first and only once after your web-service deploy. Even if init() took the model_name parameter, there isn't a straight forward way to call this method directly and pass your desired model name.
But, one possible solution is:
You can create params file like below and store the file in azure blob storage.
Example runtime parameters generation script:
import pickle
params = {'model_name': 'YOUR_MODEL_NAME_TO_USE'}
with open('runtime_params.pkl', 'wb') as file:
pickle.dump(params, file)
You'll need to use Azure Storage Python SDK to write code that can read from your blob storage account. This also mentioned in the official docs here.
Then you can access this from init() function in your score script.
Example score.py script:
from azure.storage.blob import BlockBlobService
import pickle
def init():
global model
block_blob_service = BlockBlobService(connection_string='your_connection_string')
blob_item = block_blob_service.get_blob_to_bytes('your-container-name','runtime_params.pkl')
params = pickle.load(blob_item.content)
model = loadModel(params['model_name'])
You can store connection strings in Azure KeyVault for secure access. Azure ML Workspaces comes with built-in KeyVault integration. More info here.
With this approach, you're abstracting runtime params config to another cloud location rather than the container itself. So you wouldn't need to re-build the image or deploy the web-service again. Simply restarting the container will work.
If you're looking to simply re-use score.py (not changing code) for multiple model deployments in multiple containers then here's another possible solution.
You can define your model name to use in web-service in a text file and read it in score.py. You'll need to pass this text file as a dependency when setting up the image config.
This would, however, need multiple params files for each container deployment.
Passing 'runtime_params.pkl' in dependencies to your image config (More detail example here):
image_config = ContainerImage.image_configuration(execution_script="score.py",
runtime="python",
conda_file="myenv.yml",
dependencies=["runtime_params.pkl"],
docker_file="Dockerfile")
Reading this in your score.py init() function:
def init():
global model
with open('runtime_params.pkl', 'rb') as file:
params = pickle.load(file)
model = loadModel(params['model_name'])
Since your creating a new image config with this approach, you'll need to build the image and re-deploy the service.

How to connect Google Datastore from a script in Python 3

We want to do some stuff with the data that is in the Google Datastore. We have a database already, We would like to use Python 3 to handle the data and make queries from a script on our developing machines. Which would be the easiest way to accomplish what we need?
From the Official Documentation:
You will need to install the Cloud Datastore client library for Python:
pip install --upgrade google-cloud-datastore
Set up authentication by creating a service account and setting an environment variable. It will be easier if you see it, please take a look at the official documentation for more info about this. You can perform this step by either using the GCP console or command line.
Then you will be able to connect to your Cloud Datastore client and use it, as in the example below:
# Imports the Google Cloud client library
from google.cloud import datastore
# Instantiates a client
datastore_client = datastore.Client()
# The kind for the new entity
kind = 'Task'
# The name/ID for the new entity
name = 'sampletask1'
# The Cloud Datastore key for the new entity
task_key = datastore_client.key(kind, name)
# Prepares the new entity
task = datastore.Entity(key=task_key)
task['description'] = 'Buy milk'
# Saves the entity
datastore_client.put(task)
print('Saved {}: {}'.format(task.key.name, task['description']))
As #JohnHanley mentioned, you will find a good example on this Bookshelf app tutorial that uses Cloud Datastore to store its persistent data and metadata for books.
You can create a service account and download the credentials as JSON and then set an environment variable called GOOGLE_APPLICATION_CREDENTIALS pointing to the json file. You can see the details at the link below.
https://googleapis.dev/python/google-api-core/latest/auth.html

Get list of application packages available for a batch account of Azure Batch

I'm making a python app that launches a batch.
I want, via user inputs, to create a pool.
For simplicity, I'll just add all the applications present in the batch account to the pool.
I'm not able to get the list of available application packages.
This is the portion of code:
import azure.batch.batch_service_client as batch
from azure.common.credentials import ServicePrincipalCredentials
credentials = ServicePrincipalCredentials(
client_id='xxxxx',
secret='xxxxx',
tenant='xxxx',
resource="https://batch.core.windows.net/"
)
batch_client = batch.BatchServiceClient(
credentials,
base_url=self.AppData['CloudSettings'][0]['BatchAccountURL'])
# Get list of applications
batchApps = batch_client.application.list()
I can create a pool, so credentials are good and there are applications but the returned list is empty.
Can anybody help me with this?
Thank you,
Guido
Update:
I tried:
import azure.batch.batch_service_client as batch
batchApps = batch.ApplicationOperations.list(batch_client)
and
import azure.batch.operations as batch_operations
batchApps = batch_operations.ApplicationOperations.list(batch_client)
but they don't seem to work. batchApps is always empty.
I don't think it's an authentication issue since I'd get an error otherwise.
At this point I wonder if it just a bug in the python SDK?
The SDK version I'm using is:
azure.batch: 4.1.3
azure: 4.0.0
This is a screenshot of the empty batchApps var:
Is this the link you are looking for:
Understanding the application package concept here: https://learn.microsoft.com/en-us/azure/batch/batch-application-packages
Since its python SDK in action here: https://learn.microsoft.com/en-us/python/api/azure-batch/azure.batch.operations.applicationoperations?view=azure-python
list operation and here is get
hope this helps.
I haven't tried lately using the Azure Python SDK but the way I solved this was to use the Azure REST API:
https://learn.microsoft.com/en-us/rest/api/batchservice/application/list
For the authorization, I had to create an application and give it access to the Batch services and the I programmatically generated the token with the following request:
data = {'grant_type': 'client_credentials',
'client_id': clientId,
'client_secret': clientSecret,
'resource': 'https://batch.core.windows.net/'}
postReply = requests.post('https://login.microsoftonline.com/' + tenantId + '/oauth2/token', data)

Resources