Unable to build local AMLS environment with private wheel - azure-machine-learning-service

I am trying to write a small program using the AzureML Python SDK (v1.0.85) to register an Environment in AMLS and use that definition to construct a local Conda environment when experiments are being run (for a pre-trained model). The code works fine for simple scenarios where all dependencies are loaded from Conda/ public PyPI, but when I introduce a private dependency (e.g. a utils library) I am getting a InternalServerError with the message "Error getting recipe specifications".
The code I am using to register the environment is (after having authenticated to Azure and connected to our workspace):
environment_name = config['environment']['name']
py_version = "3.7"
conda_packages = ["pip"]
pip_packages = ["azureml-defaults"]
private_packages = ["./env-wheels/utils-0.0.3-py3-none-any.whl"]
print(f"Creating environment with name {environment_name}")
environment = Environment(name=environment_name)
conda_deps = CondaDependencies()
print(f"Adding Python version: {py_version}")
conda_deps.set_python_version(py_version)
for conda_pkg in conda_packages:
print(f"Adding Conda denpendency: {conda_pkg}")
conda_deps.add_conda_package(conda_pkg)
for pip_pkg in pip_packages:
print(f"Adding Pip dependency: {pip_pkg}")
conda_deps.add_pip_package(pip_pkg)
for private_pkg in private_packages:
print(f"Uploading private wheel from {private_pkg}")
private_pkg_url = Environment.add_private_pip_wheel(workspace=ws, file_path=Path(private_pkg).absolute(), exist_ok=True)
print(f"Adding private Pip dependency: {private_pkg_url}")
conda_deps.add_pip_package(private_pkg_url)
environment.python.conda_dependencies = conda_deps
environment.register(workspace=ws)
And the code I am using to create the local Conda environment is:
amls_environment = Environment.get(ws, name=environment_name, version=environment_version)
print(f"Building environment...")
amls_environment.build_local(workspace=ws)
The exact error message being returned when build_local(...) is called is:
Traceback (most recent call last):
File "C:\Anaconda\envs\AMLSExperiment\lib\site-packages\azureml\core\environment.py", line 814, in build_local
raise error
File "C:\Anaconda\envs\AMLSExperiment\lib\site-packages\azureml\core\environment.py", line 807, in build_local
recipe = environment_client._get_recipe_for_build(name=self.name, version=self.version, **payload)
File "C:\Anaconda\envs\AMLSExperiment\lib\site-packages\azureml\_restclient\environment_client.py", line 171, in _get_recipe_for_build
raise Exception(message)
Exception: Error getting recipe specifications. Code: 500
: {
"error": {
"code": "ServiceError",
"message": "InternalServerError",
"detailsUri": null,
"target": null,
"details": [],
"innerError": null,
"debugInfo": null
},
"correlation": {
"operation": "15043e1469e85a4c96a3c18c45a2af67",
"request": "19231be75a2b8192"
},
"environment": "westeurope",
"location": "westeurope",
"time": "2020-02-28T09:38:47.8900715+00:00"
}
Process finished with exit code 1
Has anyone seen this error before or able to provide some guidance around what the issue may be?

The issue was with out firewall blocking the required requests between AMLS and the storage container (I presume to get the environment definitions/ private wheels).
We resolved this by updating the firewall with appropriate ALLOW rules for the AMLS service to contact and read from the attached storage container.

Assuming that you'd like to run in the script on a remote compute, then my suggestion would be to pass the environment you just "got". to a RunConfiguration, then pass that to an ScriptRunConfig, Estimator, or a PythonScriptStep
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
src = ScriptRunConfig(source_directory=project_folder, script='train.py')
# Set compute target to the one created in previous step
src.run_config.target = cpu_cluster.name
# Set environment
amls_environment = Environment.get(ws, name=environment_name, version=environment_version)
src.run_config.environment = amls_environment
run = experiment.submit(config=src)
run
Check out the rest of the notebook here.
If you're looking for a local run this notebook might help.

Related

error in installing knn plugin for elasticsearch

I am trying to install the KNN plugin to get the following code working:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
es = Elasticsearch('http://127.0.0.1:9200',verify_certs=False)
settings={
"settings":{
"index": {
"knn": "true",
"knn.space_type": "cosinesimil"
}
},
"mappings":{
"dynamic": "true",
"_source":{
"enabled":"true"
},
"properties":{
"vector": {
"type": "knn_vector",
"dimension": "768"
}
}
}
}
es.indices.create(index = 'document_embeddings', ignore=[400,404], body = settings)
but am getting the following error:
{'error': {'root_cause': [{'type': 'illegal_argument_exception',
'reason': 'unknown setting [index.knn] please check that any required plugins are installed, or check the breaking changes documentation for removed settings'}],
'type': 'illegal_argument_exception',
'reason': 'unknown setting [index.knn] please check that any required plugins are installed, or check the breaking changes documentation for removed settings',
'suppressed': [{'type': 'illegal_argument_exception',
'reason': 'unknown setting [index.knn.space_type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings'}]},
'status': 400}
I think the cause is that the KNN plugin is not installed, so I am installing it in the following way where thee zip file is downloaded from here.
-> Installing file://elastiknn-8.6.1.0.zip
-> Downloading file://elastiknn-8.6.1.0.zip
-> Failed installing file://elastiknn-8.6.1.0.zip
-> Rolling back file://elastiknn-8.6.1.0.zip
-> Rolled back file://elastiknn-8.6.1.0.zip
Exception in thread "main" java.net.UnknownHostException: elastiknn-8.6.1.0.zip
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:560)
at java.base/java.net.Socket.connect(Socket.java:666)
at java.base/sun.net.ftp.impl.FtpClient.doConnect(FtpClient.java:1045)
at java.base/sun.net.ftp.impl.FtpClient.tryConnect(FtpClient.java:1010)
at java.base/sun.net.ftp.impl.FtpClient.connect(FtpClient.java:1102)
at java.base/sun.net.ftp.impl.FtpClient.connect(FtpClient.java:1088)
at java.base/sun.net.www.protocol.ftp.FtpURLConnection.connect(FtpURLConnection.java:318)
at java.base/sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(FtpURLConnection.java:424)
at org.elasticsearch.plugins.cli.InstallPluginAction.downloadZip(InstallPluginAction.java:465)
at org.elasticsearch.plugins.cli.InstallPluginAction.download(InstallPluginAction.java:329)
at org.elasticsearch.plugins.cli.InstallPluginAction.execute(InstallPluginAction.java:247)
at org.elasticsearch.plugins.cli.InstallPluginCommand.execute(InstallPluginCommand.java:89)
at org.elasticsearch.common.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:54)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:85)
at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:94)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:85)
at org.elasticsearch.cli.Command.main(Command.java:50)
at org.elasticsearch.launcher.CliToolLauncher.main(CliToolLauncher.java:64)
The command that I am running to install the plugin is:
sudo bin/elasticsearch-plugin install file://elastiknn-8.6.1.0.zip
The version of my elasticsearch i.e. 8.6.1 seems perfectly compatible with that of elastiknn. Can anyone please tell me how can I get a working setup of KNN with elasticsearch in my mac? Thanks!
Use type dense vector. Read this documentation.

Local hosting python azure function fail with M1

As title, I want to host azure function in local with VSCode but something error.
Python version 3.9.12 (python3).
Azure Functions Core Tools
Core Tools Version: 4.0.4483 Commit hash: N/A (64-bit)
Function Runtime Version: 4.1.3.17473
host.json:
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
}
}
local.setting.json:
{
"IsEncrypted": false,
"Values": {
"FUNCTIONS_WORKER_RUNTIME": "python",
"AzureWebJobsStorage": ""
}
}
Error Message:
Functions:
HttpTrigger1: [GET,POST] http://localhost:7071/api/HttpTrigger1
For detailed output, run func with --verbose flag.
....
[2022-05-09T06:52:10.300Z] from . import dispatcher
[2022-05-09T06:52:10.300Z] File "/opt/homebrew/Cellar/azure-functions-core-tools#4/4.0.4483/workers/python/3.9/OSX/X64/azure_functions_worker/dispatcher.py", line 19, in <module>
[2022-05-09T06:52:10.300Z] import grpc
[2022-05-09T06:52:10.300Z] File "/opt/homebrew/Cellar/azure-functions-core-tools#4/4.0.4483/workers/python/3.9/OSX/X64/grpc/__init__.py", line 23, in <module>
[2022-05-09T06:52:10.300Z] from grpc._cython import cygrpc as _cygrpc
[2022-05-09T06:52:10.300Z] ImportError: dlopen(/opt/homebrew/Cellar/azure-functions-core-tools#4/4.0.4483/workers/python/3.9/OSX/X64/grpc/_cython/cygrpc.cpython-39-darwin.so, 0x0002): tried: '/opt/homebrew/Cellar/azure-functions-core-tools#4/4.0.4483/workers/python/3.9/OSX/X64/grpc/_cython/cygrpc.cpython-39-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')), '/usr/local/lib/cygrpc.cpython-39-darwin.so' (no such file), '/usr/lib/cygrpc.cpython-39-darwin.so' (no such file)
[2022-05-09T06:52:13.512Z] Host lock lease acquired by instance ID '0000000000000000000000008F1C7F2E'.
After reproducing from our end we observed that If you have an arm64 Python, it'll never be able to load an x86_64 shared library hence we need to enable Rosetta which works at a process by process level.
Steps to be followed
Check the Rosetta in iTerm.
Install homebrew, azure functions core tools, and python in the current homebrew.
And then run your azure function.
REFERENCES:
Support running on M1 Macs [Python]

unable to initialize snowflake data source

I am trying to access the snowflake datasource using "great_expectations" library.
The following is what I tried so far:
from ruamel import yaml
import great_expectations as ge
from great_expectations.core.batch import BatchRequest, RuntimeBatchRequest
context = ge.get_context()
datasource_config = {
"name": "my_snowflake_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": "snowflake://myusername:mypass#myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"include_schema_name": True,
},
},
}
print(context.test_yaml_config(yaml.dump(datasource_config)))
I initiated great_expectation before executing above code:
great_expectations init
but I am getting the error below:
great_expectations.exceptions.exceptions.DatasourceInitializationError: Cannot initialize datasource my_snowflake_datasource, error: 'NoneType' object has no attribute 'create_engine'
What am I doing wrong?
Your configuration seems to be ok, corresponding to the example here.
If you look at the traceback you should notice that the error propagates starting at the file great_expectations/execution_engine/sqlalchemy_execution_engine.py in your virtual environment.
The actual line where the error occurs is:
self.engine = sa.create_engine(connection_string, **kwargs)
And if you search for that sa at the top of that file:
import sqlalchemy as sa
make_url = import_make_url()
except ImportError:
sa = None
So sqlalchemy is not installed, which you
don't get automatically in your environement if you install greate_expectiations. The thing to do is to
install snowflake-sqlalchemy, since you want to use sqlalchemy's snowflake
plugin (assumption based on your connection_string).
/your/virtualenv/bin/python -m pip install snowflake-sqlalchemy
After that you should no longer get an error, it looks like test_yaml_config is waiting for the connection
to time out.
What worries me greatly is the documented use of a deprecated API of ruamel.yaml.
The function ruamel.yaml.dump is going to be removed in the near future, and you
should use the .dump() method of a ruamel.yaml.YAML() instance.
You should use the following code instead:
import sys
from ruamel.yaml import YAML
import great_expectations as ge
context = ge.get_context()
datasource_config = {
"name": "my_snowflake_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": "snowflake://myusername:mypass#myaccount/myDB/myschema?warehouse=mywh&role=myadmin",
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
"default_inferred_data_connector_name": {
"class_name": "InferredAssetSqlDataConnector",
"include_schema_name": True,
},
},
}
yaml = YAML()
yaml.dump(datasource_config, sys.stdout, transform=context.test_yaml_config)
I'll make a PR for great-excpectations to update their documentation/use of ruamel.yaml.

Azure-ML Deployment does NOT see AzureML Environment (wrong version number)

I've followed the documentation pretty well as outlined here.
I've setup my azure machine learning environment the following way:
from azureml.core import Workspace
# Connect to the workspace
ws = Workspace.from_config()
from azureml.core import Environment
from azureml.core import ContainerRegistry
myenv = Environment(name = "myenv")
myenv.inferencing_stack_version = "latest" # This will install the inference specific apt packages.
# Docker
myenv.docker.enabled = True
myenv.docker.base_image_registry.address = "myazureregistry.azurecr.io"
myenv.docker.base_image_registry.username = "myusername"
myenv.docker.base_image_registry.password = "mypassword"
myenv.docker.base_image = "4fb3..."
myenv.docker.arguments = None
# Environment variable (I need python to look at folders
myenv.environment_variables = {"PYTHONPATH":"/root"}
# python
myenv.python.user_managed_dependencies = True
myenv.python.interpreter_path = "/opt/miniconda/envs/myenv/bin/python"
from azureml.core.conda_dependencies import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package("azureml-defaults")
myenv.python.conda_dependencies=conda_dep
myenv.register(workspace=ws) # works!
I have a score.py file configured for inference (not relevant to the problem I'm having)...
I then setup inference configuration
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
I setup my compute cluster:
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException
# Choose a name for your cluster
aks_name = "theclustername"
# Check to see if the cluster already exists
try:
aks_target = ComputeTarget(workspace=ws, name=aks_name)
print('Found existing compute target')
except ComputeTargetException:
print('Creating a new compute target...')
prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6_Promo")
aks_target = ComputeTarget.create(workspace=ws, name=aks_name, provisioning_configuration=prov_config)
aks_target.wait_for_completion(show_output=True)
from azureml.core.webservice import AksWebservice
# Example
gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
num_replicas=3,
cpu_cores=4,
memory_gb=10)
Everything succeeds; then I try and deploy the model for inference:
from azureml.core.model import Model
model = Model(ws, name="thenameofmymodel")
# Name of the web service that is deployed
aks_service_name = 'tryingtodeply'
# Deploy the model
aks_service = Model.deploy(ws,
aks_service_name,
models=[model],
inference_config=inference_config,
deployment_config=gpu_aks_config,
deployment_target=aks_target,
overwrite=True)
aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)
And it fails saying that it can't find the environment. More specifically, my environment version is version 11, but it keeps trying to find an environment with a version number that is 1 higher (i.e., version 12) than the current environment:
FailedERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 0f03a025-3407-4dc1-9922-a53cc27267d4
More information can be found here:
Error:
{
"code": "BadRequest",
"statusCode": 400,
"message": "The request is invalid",
"details": [
{
"code": "EnvironmentDetailsFetchFailedUserError",
"message": "Failed to fetch details for Environment with Name: myenv Version: 12."
}
]
}
I have tried to manually edit the environment JSON to match the version that azureml is trying to fetch, but nothing works. Can anyone see anything wrong with this code?
Update
Changing the name of the environment (e.g., my_inference_env) and passing it to InferenceConfig seems to be on the right track. However, the error now changes to the following
Running..........
Failed
ERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: f0dfc13b-6fb6-494b-91a7-de42b9384692
More information can be found here: https://some_long_http_address_that_leads_to_nothing
Error:
{
"code": "DeploymentFailed",
"statusCode": 404,
"message": "Deployment not found"
}
Solution
The answer from Anders below is indeed correct regarding the use of azure ML environments. However, the last error I was getting was because I was setting the container image using the digest value (a sha) and NOT the image name and tag (e.g., imagename:tag). Note the line of code in the first block:
myenv.docker.base_image = "4fb3..."
I reference the digest value, but it should be changed to
myenv.docker.base_image = "imagename:tag"
Once I made that change, the deployment succeeded! :)
One concept that took me a while to get was the bifurcation of registering and using an Azure ML Environment. If you have already registered your env, myenv, and none of the details of the your environment have changed, there is no need re-register it with myenv.register(). You can simply get the already register env using Environment.get() like so:
myenv = Environment.get(ws, name='myenv', version=11)
My recommendation would be to name your environment something new: like "model_scoring_env". Register it once, then pass it to the InferenceConfig.

Unable to host docker image from azure registry to azure batch

I am new to docker as well as azure batch. The problem i am having currently is i have 2 dotnet console applications one of them runs locally (which creates the pool, job and task on azure batch programmatically) and for second one i have created a docker image and pushed to azure container registry. Now the things is when i create the cloudtTask from locally running application as monetione below
TaskContainerSettings cmdContainerSettings = new TaskContainerSettings(
imageName: "myrepository.azurecr.io/pipeline:latest",
containerRunOptions: "--rm"
);
CloudTask containerTask = new CloudTask(
id: "task1",
commandline: cmdLine);
containerTask.ContainerSettings = cmdContainerSettings;
Console.WriteLine("Task created");
await batchClient.JobOperations.AddTaskAsync(newJobId, containerTask);
Console.WriteLine("-----------------------");
and add it to the BatchClient, the expcetion i get in azure batch (Azure portal) is this:
System.UnauthorizedAccessException: Access to the path '/home/_azbatch/.dotnet' is denied. ---> System.IO.IOException: Permission denied
--- End of inner exception stack trace ---
What can be the problem? Thank you.
As the comment ended up being the answer, I'm posting it here for clarity for future viewers:
The task needs to be run with elevated rights.
eg.
containerTask.UserIdentity = new UserIdentity(new AutoUserSpecification(elevationLevel: ElevationLevel.Admin, scope: AutoUserScope.Task));
See the docs for more info
i am still not able to pull image from docker, i am using nodejs .. following are configs for creating task
const taskConfig = {
"id": "task-new-2",
"commandLine": "bash -c 'node index.js'",
"containerSettings": {
"imageName": "xxx.xx.io/xx-test:latest",
"containerRunOptions": "--rm",
"username": "xxx",
"password": "tfDlZ",
"registryServer": "xxx.xx.io",
// "workingDirectory": "AZ_BATCH_NODE_ROOT_DIR"
},
"userIdentity": {
"autoUser": {
"scope": "pool",
"elevationLevel": "admin"
}
}
}

Resources