Input format for Tensorflow models on GCP AI Platform - python-3.x

I have a uploaded a model to GCP AI Platform Models. It's a simple Keras, Multistep Model, with 5 features trained on 168 lagged values. When I am trying to test the models in, I'm getting this strange error message:
"error": "Prediction failed: Error during model execution: <_MultiThreadedRendezvous of RPC that terminated with:\n\tstatus = StatusCode.FAILED_PRECONDITION\n\tdetails = \"Error while reading resource variable dense_7/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_7/bias)\n\t [[{{node model_2/dense_7/BiasAdd/ReadVariableOp}}]]\"\n\tdebug_error_string = \"{\"created\":\"#1618946146.138507164\",\"description\":\"Error received from peer ipv4:127.0.0.1:8081\",\"file\":\"src/core/lib/surface/call.cc\",\"file_line\":1061,\"grpc_message\":\"Error while reading resource variable dense_7/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_7/bias)\\n\\t [[{{node model_2/dense_7/BiasAdd/ReadVariableOp}}]]\",\"grpc_status\":9}\"\n>"
The input is on the following format, a list ((1, 168, 5))
See below of example:
{
"instances":
[[[ 3.10978284e-01, 2.94650396e-01, 8.83664149e-01,
1.60210423e+00, -1.47402699e+00],
[ 3.10978284e-01, 2.94650396e-01, 5.23466315e-01,
1.60210423e+00, -1.47402699e+00],
[ 8.68576328e-01, 7.78699823e-01, 2.83334426e-01,
1.60210423e+00, -1.47402699e+00]]]
}

Related

How to format the file path in an MLTable for Azure Machine Learning uploaded during a pipeline job?

How is the path to a (.csv) file to be expressed in a MLTable file
that is created in a local folder but then uploaded as part of a
pipline job?
I'm following the Jupyter notebook automl-forecasting-task-energy-demand-advance from the azuerml-examples repo (article and notebook). This example has a MLTable file as below referencing a .csv file with a relative path. Then in the pipeline the MLTable is uploaded to be accessible to a remote compute (a few things are omitted for brevity)
my_training_data_input = Input(
type=AssetTypes.MLTABLE, path="./data/training-mltable-folder"
)
compute = AmlCompute(
name=compute_name, size="STANDARD_D2_V2", min_instances=0, max_instances=4
)
forecasting_job = automl.forecasting(
compute=compute_name, # name of the compute target we created above
# name="dpv2-forecasting-job-02",
experiment_name=exp_name,
training_data=my_training_data_input,
# validation_data = my_validation_data_input,
target_column_name="demand",
primary_metric="NormalizedRootMeanSquaredError",
n_cross_validations="auto",
enable_model_explainability=True,
tags={"my_custom_tag": "My custom value"},
)
returned_job = ml_client.jobs.create_or_update(
forecasting_job
)
ml_client.jobs.stream(returned_job.name)
But running this gives the error
Error meassage:
Encountered user error while fetching data from Dataset. Error: UserErrorException:
Message: MLTable yaml schema is invalid:
Error Code: Validation
Validation Error Code: Invalid MLTable
Validation Target: MLTableToDataflow
Error Message: Failed to convert a MLTable to dataflow
uri path is not a valid datastore uri path
| session_id=857bd9a1-097b-4df6-aa1c-8871f89580d8
InnerException None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "MLTable yaml schema is invalid: \nError Code: Validation\nValidation Error Code: Invalid MLTable\nValidation Target: MLTableToDataflow\nError Message: Failed to convert a MLTable to dataflow\nuri path is not a valid datastore uri path\n| session_id=857bd9a1-097b-4df6-aa1c-8871f89580d8"
}
}
paths:
- file: ./nyc_energy_training_clean.csv
transformations:
- read_delimited:
delimiter: ','
encoding: 'ascii'
- convert_column_types:
- columns: demand
column_type: float
- columns: precip
column_type: float
- columns: temp
column_type: float
How am I supposed to run this? Thanks in advance!
For Remote PATH you can use the below and here is the document for create data assets.
It's important to note that the path specified in the MLTable file must be a valid path in the cloud, not just a valid path on your local machine.

AWS SAM template error : collections.OrderedDict' object has no attribute 'startswith

I am getting this error while using SAM template for deploying resources
below is the script
- sam package --template-file test.json --s3-bucket $s3_bucket --s3-prefix packages/my_folder/ --output-template-file samtemplate.yml
getting this error even tried after rollbacking to previous working status
return any([url.startswith(prefix) for prefix in ["s3://", "http://", "https://"]])
File "/usr/local/lib/python3.8/site-packages/samcli/lib/providers/sam_stack_provider.py", line 250, in
return any([url.startswith(prefix) for prefix in ["s3://", "http://", "https://"]])
AttributeError: 'collections.OrderedDict' object has no attribute 'startswith'
After adding some debug message I got this error
2021-04-22 06:42:32,820 | Unable to resolve property S3bucketname: OrderedDict([('Fn::Select', ['0', OrderedDict([('Fn::Split', ['/', OrderedDict([('Ref', 'TemplateS3BucketName')])])])])]). Leaving as is.

Azure-ML Deployment does NOT see AzureML Environment (wrong version number)

I've followed the documentation pretty well as outlined here.
I've setup my azure machine learning environment the following way:
from azureml.core import Workspace
# Connect to the workspace
ws = Workspace.from_config()
from azureml.core import Environment
from azureml.core import ContainerRegistry
myenv = Environment(name = "myenv")
myenv.inferencing_stack_version = "latest" # This will install the inference specific apt packages.
# Docker
myenv.docker.enabled = True
myenv.docker.base_image_registry.address = "myazureregistry.azurecr.io"
myenv.docker.base_image_registry.username = "myusername"
myenv.docker.base_image_registry.password = "mypassword"
myenv.docker.base_image = "4fb3..."
myenv.docker.arguments = None
# Environment variable (I need python to look at folders
myenv.environment_variables = {"PYTHONPATH":"/root"}
# python
myenv.python.user_managed_dependencies = True
myenv.python.interpreter_path = "/opt/miniconda/envs/myenv/bin/python"
from azureml.core.conda_dependencies import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package("azureml-defaults")
myenv.python.conda_dependencies=conda_dep
myenv.register(workspace=ws) # works!
I have a score.py file configured for inference (not relevant to the problem I'm having)...
I then setup inference configuration
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
I setup my compute cluster:
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException
# Choose a name for your cluster
aks_name = "theclustername"
# Check to see if the cluster already exists
try:
aks_target = ComputeTarget(workspace=ws, name=aks_name)
print('Found existing compute target')
except ComputeTargetException:
print('Creating a new compute target...')
prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6_Promo")
aks_target = ComputeTarget.create(workspace=ws, name=aks_name, provisioning_configuration=prov_config)
aks_target.wait_for_completion(show_output=True)
from azureml.core.webservice import AksWebservice
# Example
gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
num_replicas=3,
cpu_cores=4,
memory_gb=10)
Everything succeeds; then I try and deploy the model for inference:
from azureml.core.model import Model
model = Model(ws, name="thenameofmymodel")
# Name of the web service that is deployed
aks_service_name = 'tryingtodeply'
# Deploy the model
aks_service = Model.deploy(ws,
aks_service_name,
models=[model],
inference_config=inference_config,
deployment_config=gpu_aks_config,
deployment_target=aks_target,
overwrite=True)
aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)
And it fails saying that it can't find the environment. More specifically, my environment version is version 11, but it keeps trying to find an environment with a version number that is 1 higher (i.e., version 12) than the current environment:
FailedERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 0f03a025-3407-4dc1-9922-a53cc27267d4
More information can be found here:
Error:
{
"code": "BadRequest",
"statusCode": 400,
"message": "The request is invalid",
"details": [
{
"code": "EnvironmentDetailsFetchFailedUserError",
"message": "Failed to fetch details for Environment with Name: myenv Version: 12."
}
]
}
I have tried to manually edit the environment JSON to match the version that azureml is trying to fetch, but nothing works. Can anyone see anything wrong with this code?
Update
Changing the name of the environment (e.g., my_inference_env) and passing it to InferenceConfig seems to be on the right track. However, the error now changes to the following
Running..........
Failed
ERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: f0dfc13b-6fb6-494b-91a7-de42b9384692
More information can be found here: https://some_long_http_address_that_leads_to_nothing
Error:
{
"code": "DeploymentFailed",
"statusCode": 404,
"message": "Deployment not found"
}
Solution
The answer from Anders below is indeed correct regarding the use of azure ML environments. However, the last error I was getting was because I was setting the container image using the digest value (a sha) and NOT the image name and tag (e.g., imagename:tag). Note the line of code in the first block:
myenv.docker.base_image = "4fb3..."
I reference the digest value, but it should be changed to
myenv.docker.base_image = "imagename:tag"
Once I made that change, the deployment succeeded! :)
One concept that took me a while to get was the bifurcation of registering and using an Azure ML Environment. If you have already registered your env, myenv, and none of the details of the your environment have changed, there is no need re-register it with myenv.register(). You can simply get the already register env using Environment.get() like so:
myenv = Environment.get(ws, name='myenv', version=11)
My recommendation would be to name your environment something new: like "model_scoring_env". Register it once, then pass it to the InferenceConfig.

Unable to build local AMLS environment with private wheel

I am trying to write a small program using the AzureML Python SDK (v1.0.85) to register an Environment in AMLS and use that definition to construct a local Conda environment when experiments are being run (for a pre-trained model). The code works fine for simple scenarios where all dependencies are loaded from Conda/ public PyPI, but when I introduce a private dependency (e.g. a utils library) I am getting a InternalServerError with the message "Error getting recipe specifications".
The code I am using to register the environment is (after having authenticated to Azure and connected to our workspace):
environment_name = config['environment']['name']
py_version = "3.7"
conda_packages = ["pip"]
pip_packages = ["azureml-defaults"]
private_packages = ["./env-wheels/utils-0.0.3-py3-none-any.whl"]
print(f"Creating environment with name {environment_name}")
environment = Environment(name=environment_name)
conda_deps = CondaDependencies()
print(f"Adding Python version: {py_version}")
conda_deps.set_python_version(py_version)
for conda_pkg in conda_packages:
print(f"Adding Conda denpendency: {conda_pkg}")
conda_deps.add_conda_package(conda_pkg)
for pip_pkg in pip_packages:
print(f"Adding Pip dependency: {pip_pkg}")
conda_deps.add_pip_package(pip_pkg)
for private_pkg in private_packages:
print(f"Uploading private wheel from {private_pkg}")
private_pkg_url = Environment.add_private_pip_wheel(workspace=ws, file_path=Path(private_pkg).absolute(), exist_ok=True)
print(f"Adding private Pip dependency: {private_pkg_url}")
conda_deps.add_pip_package(private_pkg_url)
environment.python.conda_dependencies = conda_deps
environment.register(workspace=ws)
And the code I am using to create the local Conda environment is:
amls_environment = Environment.get(ws, name=environment_name, version=environment_version)
print(f"Building environment...")
amls_environment.build_local(workspace=ws)
The exact error message being returned when build_local(...) is called is:
Traceback (most recent call last):
File "C:\Anaconda\envs\AMLSExperiment\lib\site-packages\azureml\core\environment.py", line 814, in build_local
raise error
File "C:\Anaconda\envs\AMLSExperiment\lib\site-packages\azureml\core\environment.py", line 807, in build_local
recipe = environment_client._get_recipe_for_build(name=self.name, version=self.version, **payload)
File "C:\Anaconda\envs\AMLSExperiment\lib\site-packages\azureml\_restclient\environment_client.py", line 171, in _get_recipe_for_build
raise Exception(message)
Exception: Error getting recipe specifications. Code: 500
: {
"error": {
"code": "ServiceError",
"message": "InternalServerError",
"detailsUri": null,
"target": null,
"details": [],
"innerError": null,
"debugInfo": null
},
"correlation": {
"operation": "15043e1469e85a4c96a3c18c45a2af67",
"request": "19231be75a2b8192"
},
"environment": "westeurope",
"location": "westeurope",
"time": "2020-02-28T09:38:47.8900715+00:00"
}
Process finished with exit code 1
Has anyone seen this error before or able to provide some guidance around what the issue may be?
The issue was with out firewall blocking the required requests between AMLS and the storage container (I presume to get the environment definitions/ private wheels).
We resolved this by updating the firewall with appropriate ALLOW rules for the AMLS service to contact and read from the attached storage container.
Assuming that you'd like to run in the script on a remote compute, then my suggestion would be to pass the environment you just "got". to a RunConfiguration, then pass that to an ScriptRunConfig, Estimator, or a PythonScriptStep
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
src = ScriptRunConfig(source_directory=project_folder, script='train.py')
# Set compute target to the one created in previous step
src.run_config.target = cpu_cluster.name
# Set environment
amls_environment = Environment.get(ws, name=environment_name, version=environment_version)
src.run_config.environment = amls_environment
run = experiment.submit(config=src)
run
Check out the rest of the notebook here.
If you're looking for a local run this notebook might help.

getting error while connecting to redshift from aws lambda function

trying to connecting redshift from aws lambda python code using psycopg2 lib, when running same code from EC2 instance not getting any error. getting below error response.
{
"errorMessage": "FATAL: no pg_hba.conf entry for host \"::xxxxx\", user \"xxxx\", database \"xxxx\", SSL off\n",
"errorType": "OperationalError",
"stackTrace": [
[
"/var/task/aws_unload_to_s3_audit.py",
86,
"lambda_handler",
"mainly()"
],
[
"/var/task/aws_unload_to_s3_audit.py",
74,
"mainly",
"con = psycopg2.connect(conn_string)"
],
[
"/var/task/psycopg2/__init__.py",
130,
"connect",
"conn = _connect(dsn, connection_factory=connection_factory, **kwasync)"
]
]
}
My suggestion on this would be putting checking the network configuration for Redshift, chances are that the connection is being refused.
Places to check -
Redshift Security Group
VPC configuration is the lambda resides under a private subnet.

Resources