How to format the file path in an MLTable for Azure Machine Learning uploaded during a pipeline job? - azure-machine-learning-service

How is the path to a (.csv) file to be expressed in a MLTable file
that is created in a local folder but then uploaded as part of a
pipline job?
I'm following the Jupyter notebook automl-forecasting-task-energy-demand-advance from the azuerml-examples repo (article and notebook). This example has a MLTable file as below referencing a .csv file with a relative path. Then in the pipeline the MLTable is uploaded to be accessible to a remote compute (a few things are omitted for brevity)
my_training_data_input = Input(
type=AssetTypes.MLTABLE, path="./data/training-mltable-folder"
)
compute = AmlCompute(
name=compute_name, size="STANDARD_D2_V2", min_instances=0, max_instances=4
)
forecasting_job = automl.forecasting(
compute=compute_name, # name of the compute target we created above
# name="dpv2-forecasting-job-02",
experiment_name=exp_name,
training_data=my_training_data_input,
# validation_data = my_validation_data_input,
target_column_name="demand",
primary_metric="NormalizedRootMeanSquaredError",
n_cross_validations="auto",
enable_model_explainability=True,
tags={"my_custom_tag": "My custom value"},
)
returned_job = ml_client.jobs.create_or_update(
forecasting_job
)
ml_client.jobs.stream(returned_job.name)
But running this gives the error
Error meassage:
Encountered user error while fetching data from Dataset. Error: UserErrorException:
Message: MLTable yaml schema is invalid:
Error Code: Validation
Validation Error Code: Invalid MLTable
Validation Target: MLTableToDataflow
Error Message: Failed to convert a MLTable to dataflow
uri path is not a valid datastore uri path
| session_id=857bd9a1-097b-4df6-aa1c-8871f89580d8
InnerException None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "MLTable yaml schema is invalid: \nError Code: Validation\nValidation Error Code: Invalid MLTable\nValidation Target: MLTableToDataflow\nError Message: Failed to convert a MLTable to dataflow\nuri path is not a valid datastore uri path\n| session_id=857bd9a1-097b-4df6-aa1c-8871f89580d8"
}
}
paths:
- file: ./nyc_energy_training_clean.csv
transformations:
- read_delimited:
delimiter: ','
encoding: 'ascii'
- convert_column_types:
- columns: demand
column_type: float
- columns: precip
column_type: float
- columns: temp
column_type: float
How am I supposed to run this? Thanks in advance!

For Remote PATH you can use the below and here is the document for create data assets.
It's important to note that the path specified in the MLTable file must be a valid path in the cloud, not just a valid path on your local machine.

Related

Input format for Tensorflow models on GCP AI Platform

I have a uploaded a model to GCP AI Platform Models. It's a simple Keras, Multistep Model, with 5 features trained on 168 lagged values. When I am trying to test the models in, I'm getting this strange error message:
"error": "Prediction failed: Error during model execution: <_MultiThreadedRendezvous of RPC that terminated with:\n\tstatus = StatusCode.FAILED_PRECONDITION\n\tdetails = \"Error while reading resource variable dense_7/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_7/bias)\n\t [[{{node model_2/dense_7/BiasAdd/ReadVariableOp}}]]\"\n\tdebug_error_string = \"{\"created\":\"#1618946146.138507164\",\"description\":\"Error received from peer ipv4:127.0.0.1:8081\",\"file\":\"src/core/lib/surface/call.cc\",\"file_line\":1061,\"grpc_message\":\"Error while reading resource variable dense_7/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_7/bias)\\n\\t [[{{node model_2/dense_7/BiasAdd/ReadVariableOp}}]]\",\"grpc_status\":9}\"\n>"
The input is on the following format, a list ((1, 168, 5))
See below of example:
{
"instances":
[[[ 3.10978284e-01, 2.94650396e-01, 8.83664149e-01,
1.60210423e+00, -1.47402699e+00],
[ 3.10978284e-01, 2.94650396e-01, 5.23466315e-01,
1.60210423e+00, -1.47402699e+00],
[ 8.68576328e-01, 7.78699823e-01, 2.83334426e-01,
1.60210423e+00, -1.47402699e+00]]]
}

Azure-ML Deployment does NOT see AzureML Environment (wrong version number)

I've followed the documentation pretty well as outlined here.
I've setup my azure machine learning environment the following way:
from azureml.core import Workspace
# Connect to the workspace
ws = Workspace.from_config()
from azureml.core import Environment
from azureml.core import ContainerRegistry
myenv = Environment(name = "myenv")
myenv.inferencing_stack_version = "latest" # This will install the inference specific apt packages.
# Docker
myenv.docker.enabled = True
myenv.docker.base_image_registry.address = "myazureregistry.azurecr.io"
myenv.docker.base_image_registry.username = "myusername"
myenv.docker.base_image_registry.password = "mypassword"
myenv.docker.base_image = "4fb3..."
myenv.docker.arguments = None
# Environment variable (I need python to look at folders
myenv.environment_variables = {"PYTHONPATH":"/root"}
# python
myenv.python.user_managed_dependencies = True
myenv.python.interpreter_path = "/opt/miniconda/envs/myenv/bin/python"
from azureml.core.conda_dependencies import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package("azureml-defaults")
myenv.python.conda_dependencies=conda_dep
myenv.register(workspace=ws) # works!
I have a score.py file configured for inference (not relevant to the problem I'm having)...
I then setup inference configuration
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
I setup my compute cluster:
from azureml.core.compute import ComputeTarget, AksCompute
from azureml.exceptions import ComputeTargetException
# Choose a name for your cluster
aks_name = "theclustername"
# Check to see if the cluster already exists
try:
aks_target = ComputeTarget(workspace=ws, name=aks_name)
print('Found existing compute target')
except ComputeTargetException:
print('Creating a new compute target...')
prov_config = AksCompute.provisioning_configuration(vm_size="Standard_NC6_Promo")
aks_target = ComputeTarget.create(workspace=ws, name=aks_name, provisioning_configuration=prov_config)
aks_target.wait_for_completion(show_output=True)
from azureml.core.webservice import AksWebservice
# Example
gpu_aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,
num_replicas=3,
cpu_cores=4,
memory_gb=10)
Everything succeeds; then I try and deploy the model for inference:
from azureml.core.model import Model
model = Model(ws, name="thenameofmymodel")
# Name of the web service that is deployed
aks_service_name = 'tryingtodeply'
# Deploy the model
aks_service = Model.deploy(ws,
aks_service_name,
models=[model],
inference_config=inference_config,
deployment_config=gpu_aks_config,
deployment_target=aks_target,
overwrite=True)
aks_service.wait_for_deployment(show_output=True)
print(aks_service.state)
And it fails saying that it can't find the environment. More specifically, my environment version is version 11, but it keeps trying to find an environment with a version number that is 1 higher (i.e., version 12) than the current environment:
FailedERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: 0f03a025-3407-4dc1-9922-a53cc27267d4
More information can be found here:
Error:
{
"code": "BadRequest",
"statusCode": 400,
"message": "The request is invalid",
"details": [
{
"code": "EnvironmentDetailsFetchFailedUserError",
"message": "Failed to fetch details for Environment with Name: myenv Version: 12."
}
]
}
I have tried to manually edit the environment JSON to match the version that azureml is trying to fetch, but nothing works. Can anyone see anything wrong with this code?
Update
Changing the name of the environment (e.g., my_inference_env) and passing it to InferenceConfig seems to be on the right track. However, the error now changes to the following
Running..........
Failed
ERROR - Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: f0dfc13b-6fb6-494b-91a7-de42b9384692
More information can be found here: https://some_long_http_address_that_leads_to_nothing
Error:
{
"code": "DeploymentFailed",
"statusCode": 404,
"message": "Deployment not found"
}
Solution
The answer from Anders below is indeed correct regarding the use of azure ML environments. However, the last error I was getting was because I was setting the container image using the digest value (a sha) and NOT the image name and tag (e.g., imagename:tag). Note the line of code in the first block:
myenv.docker.base_image = "4fb3..."
I reference the digest value, but it should be changed to
myenv.docker.base_image = "imagename:tag"
Once I made that change, the deployment succeeded! :)
One concept that took me a while to get was the bifurcation of registering and using an Azure ML Environment. If you have already registered your env, myenv, and none of the details of the your environment have changed, there is no need re-register it with myenv.register(). You can simply get the already register env using Environment.get() like so:
myenv = Environment.get(ws, name='myenv', version=11)
My recommendation would be to name your environment something new: like "model_scoring_env". Register it once, then pass it to the InferenceConfig.

Terraform: YAML file rendering issue in storage section of container linux config of flatcar OS

I am trying to generate a file by template rendering to pass to the user data of the ec2 instance. I am using the third party terraform provider to generate an ignition file from the YAML.
data "ct_config" "worker" {
content = data.template_file.file.rendered
strict = true
pretty_print = true
}
data "template_file" "file" {
...
...
template = file("${path.module}/example.yml")
vars = {
script = file("${path.module}/script.sh")
}
}
example.yml
storage:
files:
- path: "/opt/bin/script"
mode: 0755
contents:
inline: |
${script}
Error:
Error: Error unmarshaling yaml: yaml: line 187: could not find expected ':'
on ../../modules/launch_template/launch_template.tf line 22, in data "ct_config" "worker":
22: data "ct_config" "worker" {
If I change ${script} to sample data then it works. Also, No matter what I put in the script.sh I am getting the same error.
You want this outcome (pseudocode):
storage:
files:
- path: "/opt/bin/script"
mode: 0755
contents:
inline: |
{{content of script file}}
In your current implementation, all lines after the first loaded from script.sh will not be indented and will not be interpreted as desired (the entire script.sh content) by a YAML decoder.
Using indent you can correct the indentation and using the newer templatefile functuin you can use a slightly cleaner setup for the template:
data "ct_config" "worker" {
content = local.ct_config_content
strict = true
pretty_print = true
}
locals {
ct_config_content = templatefile("${path.module}/example.yml", {
script = indent(10, file("${path.module}/script.sh"))
})
}
For clarity, here is the example.yml template file (from the original question) to use with the code above:
storage:
files:
- path: "/opt/bin/script"
mode: 0755
contents:
inline: |
${script}
I had this exact issue with ct_config, and figured it out today. You need to base64encode your script to ensure it's written correctly without newlines - without that, newlines in your script will make it to CT, which attempts to build an Ignition file, which cannot have newlines, causing the error you ran into originally.
Once encoded, you then just need to tell CT to !!binary the file to ensure Ignition correctly base64 decodes it on deploy:
data "template_file" "file" {
...
...
template = file("${path.module}/example.yml")
vars = {
script = base64encode(file("${path.module}/script.sh"))
}
}
storage:
files:
- path: "/opt/bin/script"
mode: 0755
contents:
inline: !!binary |
${script}

How to create secured files in Puppet5 with Hiera?

I want to create SSL certificate and try to secure this operation.
I am using Puppet 5.5.2 and gem hiera-eyaml.
Created simple manifest
cat /etc/puppetlabs/code/environments/production/manifests/site.pp
package { 'tree':
ensure => installed,
}
package { 'httpd':
ensure => installed,
}
$filecrt = lookup('files')
create_resources( 'file', $filecrt )
Hiera config
---
version: 5
defaults:
# The default value for "datadir" is "data" under the same directory as the hiera.yaml
# file (this file)
# When specifying a datadir, make sure the directory exists.
# See https://puppet.com/docs/puppet/latest/environments_about.html for further details on environments.
datadir: data
data_hash: yaml_data
hierarchy:
- name: "Secret data: per-node, per-datacenter, common"
lookup_key: eyaml_lookup_key # eyaml backend
paths:
- "nodes/%{facts.fqdn}.eyaml"
- "nodes/%{trusted.certname}.eyaml" # Include explicit file extension
- "location/%{facts.whereami}.eyaml"
- "common.eyaml"
options:
pkcs7_private_key: /etc/puppetlabs/puppet/eyaml/keys/private_key.pkcs7.pem
pkcs7_public_key: /etc/puppetlabs/puppet/eyaml/keys/public_key.pkcs7.pem
- name: "YAML hierarchy levels"
paths:
- "common.yaml"
- "nodes/%{facts.fqdn}.yaml"
- "nodes/%{::trusted.certname}.yaml"
And common.yaml
---
files:
'/etc/httpd/conf/server.crt':
ensure: present
mode: '0600'
owner: 'root'
group: 'root'
content: 'ENC[PKCS7,{LOT_OF_STRING_SKIPPED}+uaCmcHgDAzsPD51soM+AIkIlv0ANpUXzBpwM3tqQ3ysFtz81S0xuVbKvslK]'
But have en error while applying manifest
Error: Evaluation Error: Error while evaluating a Function Call, create_resources(): second argument must be a hash (file: /etc/puppetlabs/code/environments/production/manifests/site.pp, line: 12, column: 1) on node test1.com
I really dont know what to do )
The problem appears to be that the indentation in common.yaml isn't right - currently, file will be null rather than a hash, which explains the error message. Also, the file should be called common.eyaml, otherwise the ENC string won't be decrypted. Try
---
files:
'/etc/httpd/conf/server.crt':
ensure: present
mode: '0600'
owner: 'root'
group: 'root'
content: 'ENC[PKCS7{LOTS_OF_STRING_SKIPPED}UXzBpwM3tqQ3ysFtz81S0xuVbKvslK]'
There is an online YAML parser at http://yaml-online-parser.appspot.com/ if you want to see the difference the indentation makes.
Found another solution.
Its was a problem with lookup and hashes. When I have multiply lines in hiera hash, I must specify them https://docs.puppet.com/puppet/4.5/function.html#lookup
So i decided use only 'content' variable to lookup
cat site.pp
$filecrt = lookup('files')
file { 'server.crt':
ensure => present,
path => '/etc/httpd/conf/server.crt',
content => $filecrt,
owner => 'root',
group => 'root',
mode => '0600',
}
and Hiera
---
files:'ENC[PKCS7{LOT_OF_STRING_SKIPPED}+uaCmcHgDAzsPD51soM+AIkIlv0ANpUXzBpwM3tqQ3ysFtz81S0xuVbKvslK]'

Hiera value not able to receiving in puppet profile

I have created a proxy_match.yaml file as a hiera source file in
default hiera datalocation.
The proxy_match.yaml is added in hiera hierarchy
Looking up hiera data in profile
Where and what am I missing, I am not able to receive the hiera data
value and thus the error appears mentioned bellow.
Where,
proxy_match is the new environment created
hierafile 1
/etc/puppetlabs/code/environments/proxy_match/hiera.yaml
version: 5
defaults:
# The default value for "datadir" is "data" under the same directory as the hiera.yaml
# file (this file)
# When specifying a datadir, make sure the directory exists.
# See https://docs.puppet.com/puppet/latest/environments.html for further details on environments.
# datadir: data
# data_hash: yaml_data
hierarchy:
- name: "environment specific yaml"
path: "proxy_match.yaml"
- name: "Per-node data (yaml version)"
path: "nodes/%{::trusted.certname}.yaml"
- name: "Other YAML hierarchy levels"
paths:
- "common.yaml"
proxy_match.yaml hiera data source file
This is the yaml hiera source named as proxy_match.yaml as in herarchy
/etc/puppetlabs/code/environments/proxy_match/data/proxy-match.yaml
---
profiles::apache::servername: "taraserver.com"
profiles::apache::port: "80"
profiles::apache::docroot: "/var/www/tarahost"
hiera lookup in profile
$servername = hiera('profiles::apache::servername',{})
$port = hiera('profiles::apache::port',{})
$docroot = hiera('profiles::apache::docroot',{})
class profile::apache{
#configure apache
include apache
apache::vhost{$servername:
port => $port,
docroot => $docroot,
}
}
#ERROR:
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: {"message":"Server Error: Evaluation Error: Error while evaluating a Resource Statement, Apache::Vhost[7fba80ae621c.domain.name]: parameter 'docroot' expects a value of type Boolean or String, got Undef at /etc/puppetlabs/code/environments/proxy_match/modules/profile/manifests/apache.pp:29 on node 94707b03ff05.domain.name","issue_kind":"RUNTIME_ERROR"}
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run
You are defining the variables outside the class definition. When Puppet loads that class, those lines you have before the class are ignored.
What you should have in your profile class is:
class profiles::apache (
String $servername,
Integer $port,
String $docroot,
) {
include apache
apache::vhost { $servername:
port => $port,
docroot => $docroot,
}
}
Note that I used the automatic parameter lookup feature to set your variables, instead of explicit calls to the hiera function.

Resources