ModelAssetPathNotFoundInStorage error when mlflow.sklearn.autolog() used to train within an Azure ML YAML Pipeline - scikit-learn

YAML appears correct, there are no validation issues, and the pipeline can be seen in the Azure ML Studio GUI
I'm assuming the error is thrown by mlflow.sklearn.autolog() when the fit() method is called
Full stacktrace not available, the exception shown below is the full exception raised in Azue ML Studio GUI
I've commented out the function save_outputs() and the same error is raised, leading to my assumption regarding MLFlow SDK attempting to autolog the model
I haven't included the code below for the predict job in the pipeline, this step doesn't get to execute as the seep job in the pipeline fails first
Exception raised in Azure ML GUI
UserErrorException:
Message: Model asset creation API failed with {'additional_properties': {'message': 'The request is invalid.', 'details': [{'code': 'ModelAssetPathNotFoundInStorage', 'message': 'No blobs found in storage at model asset path: azureml/HD_9b8798ab-c0cb-4c5d-8822-1411c01af249_0/model/'}], 'code': 'BadRequest', 'statusCode': 400}, 'error': <data_capability._restclient.model.models._models_py3.RootError object at 0x7f210407e310>, 'correlation': {'operation': '214b5eca2adce7f52fdba06fb5003437', 'request': '9515d04215048b81', 'RequestId': '9515d04215048b81'}, 'environment': '<REDACTED>', 'location': '<REDACTED>', 'time': datetime.datetime(2023, 1, 12, 23, 1, 27, 38782, tzinfo=<FixedOffset '+00:00'>), 'component_name': 'modelregistry'}
InnerException None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Model asset creation API failed with {'additional_properties': {'message': 'The request is invalid.', 'details': [{'code': 'ModelAssetPathNotFoundInStorage', 'message': 'No blobs found in storage at model asset path: azureml/HD_9b8798ab-c0cb-4c5d-8822-1411c01af249_0/model/'}], 'code': 'BadRequest', 'statusCode': 400}, 'error': <data_capability._restclient.model.models._models_py3.RootError object at 0x7f210407e310>, 'correlation': {'operation': '214b5eca2adce7f52fdba06fb5003437', 'request': '9515d04215048b81', 'RequestId': '9515d04215048b81'}, 'environment': '<REDACTED>', 'location': '<REDACTED>', 'time': datetime.datetime(2023, 1, 12, 23, 1, 27, 38782, tzinfo=<FixedOffset '+00:00'>), 'component_name': 'modelregistry'}"
}
} Marking the experiment as failed because initial child jobs have failed due to user error
CLI Command
$ az ml job create --subscription <REDACTED> --resource-group <REDACTED> --workspace-name <REDACTED> --file /home/azureuser/cloudfiles/code/Users/<REDACTED>/repos/<REDACTED>/src/assets/pipeline_tune.yml --stream
RunId: quirky_bone_tf250gdlfg
Web View: https://ml.azure.com/runs/<REDACTED>?wsid=/subscriptions/<REDACTED>/resourcegroups/<REDACTED>/workspaces/<REDACTED>
Streaming logs/azureml/executionlogs.txt
========================================
[2023-01-12 22:58:42Z] Submitting 1 runs, first five are: <REDACTED>
[2023-01-12 23:03:46Z] Execution of experiment failed, update experiment status and cancel running nodes.
Execution Summary
=================
RunId: <REDACTED>
Web View: https://ml.azure.com/runs/<REDACTED>?wsid=/subscriptions/<REDACTED>/resourcegroups/<REDACTED>/workspaces/<REDACTED>
Exception :
{
"error": {
"code": "UserError",
"message": "Pipeline has some failed steps. See child run or execution logs for more details.",
"message_format": "Pipeline has some failed steps. {0}",
"message_parameters": {},
"reference_code": "PipelineHasStepJobFailed",
"details": []
},
"environment": "<REDACTED>",
"location": "<REDACTED>",
"time": "2023-01-12T23:03:45.982134Z",
"component_name": ""
}
Folder structure
src/
assets/
component_train.yml
pipeline_tune.yml
train.py
src/assets/pipeline_tune.yml
# References
# ----------
# - How to create component pipelines
# - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-component-pipelines-cli
# - Pattern reference
# - https://github.com/Azure/azureml-examples/tree/main/cli/jobs/pipelines-with-components/pipeline_with_hyperparameter_sweep
# - Pipeline schema
# - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-pipeline
# - Sweep Job schema (hyperparameter tuning)
# - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-sweep
# - Core Azure ML YAML syntax
# - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-core-syntax#binding-inputs-and-outputs-between-steps-in-a-pipeline-job
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
# -------------------------------------------------------------------
# Pipeline settings
# - Having inputs defined at the pipelione level, instead of the first
# job, allows for parameterisation of the pipeline via both CLI/SDK
experiment_name: <REDACTED>
description: Tune hyperparemeters for training a scikit-learn SVM on the Iris dataset.
settings:
default_compute: azureml:aml-compute-cpu
default_datastore: azureml:workspaceblobstore
inputs:
data:
type: uri_file
mode: ro_mount
path: wasbs://datasets#azuremlexamples.blob.core.windows.net/iris.csv
outputs:
predict:
type: uri_folder
mode: rw_mount
path: azureml://datastores/workspaceblobstore/paths/<REDACTED>
# -------------------------------------------------------------------
# Jobs
jobs:
# Tune job
tune:
type: sweep
inputs:
data: ${{parent.inputs.data}}
outputs:
model:
type: mlflow_model
test_data:
type: uri_folder
trial: ./component_train.yml
search_space:
c_value:
type: uniform
min_value: 0.5
max_value: 0.9
kernel:
type: choice
values:
- rbf
- linear
- poly
coef0:
type: uniform
min_value: 0.1
max_value: 1
sampling_algorithm: random
objective:
goal: minimize
primary_metric: training_f1_score
limits:
max_total_trials: 20
max_concurrent_trials: 10
timeout: 7200
# Score test data
predict:
type: command
inputs:
model: ${{parent.jobs.tune.outputs.model}}
test_data: ${{parent.jobs.tune.outputs.test_data}}
outputs:
predictions: ${{parent.outputs.predict}}
component: ./component_predict.yml
src/assets/component_train.yml
# References
# ----------
# - How to create component pipelines
# - https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-component-pipelines-cli
# - Command schema
# - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-job-command
# - Core Azure ML YAML syntax
# - https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-core-syntax#binding-inputs-and-outputs-between-steps-in-a-pipeline-job
$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
type: command
# -------------------------------------------------------------------
# Command settings
version: 1
description: Training a scikit-learn SVM on the Iris dataset
environment: azureml:<REDACTED>:1
inputs:
data:
type: uri_file
c_value:
type: number
default: 1.0
kernel:
type: string
default: rbf
coef0:
type: number
default: 0
outputs:
model:
type: mlflow_model
test_data:
type: uri_folder
# -------------------------------------------------------------------
# Job
code: ..
command: >-
python train.py
--data ${{inputs.data}}
--C ${{inputs.c_value}}
--kernel ${{inputs.kernel}}
--coef0 ${{inputs.coef0}}
--outputs_model ${{outputs.model}}
--outputs_test_data ${{outputs.test_data}}
src/train.py
"""
Notes
-----
- Imports in this file must match the imports in `score.py` to allow
pickle objects to be loaded correctly by `score.py`
References
----------
- Azure ML Environments and ScriptRunConfig for training
- i.e How to execute this script against Azure ML compute cluster
- https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#use-environments-for-training
- Azure ML - Hyperparameter tuning
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
- Azure ML - Hyperparameter tuning in Azure Machine Learning pipeline
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-sweep-in-pipeline#how-to-do-hyperparameter-tuning-in-azure-machine-learning-pipeline
- Azure ML - Random, Grid, Bayesian sampling
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters#sampling-the-hyperparameter-space
- Azure ML - How to Train scikit-learn
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-scikit-learn#prepare-the-training-script
- Azure ML - How to train Tesnorflow
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-tensorflow
- Azure ML - How to train Keras
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-keras
- Azure ML - How to train PyTorch
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch
- Azure ML - Logging with MLFlow
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-log-view-metrics?tabs=jobs#getting-started
- MLFlow - Autologging of frameworks
- https://mlflow.org/docs/latest/tracking.html#automatic-logging
"""
import argparse
from distutils.dir_util import copy_tree
from pathlib import Path
import mlflow.sklearn
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
def get_data(data):
"""Get data and return train/test splits"""
df = pd.read_csv(data)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=0
)
# Return split data
return X_train, X_test, y_train, y_test
def get_hyperparameters(**kwargs):
"""Set hyperparameters here
References
----------
- Understand how to pass hyperparameters to a sklearn pipeline
- https://stackoverflow.com/questions/66388056/why-does-sklearn-pipeline-set-params-not-work
"""
hyperparameters = {
"estimator__C": kwargs.get("c_value"),
"estimator__kernel": kwargs.get("kernel"),
"estimator__coef0": kwargs.get("coef0"),
}
return hyperparameters
def save_outputs(model, model_dir, X_test, y_test, test_data_dir):
"""Save outputs of the training process"""
# Save model
local_dir = "model"
mlflow.sklearn.save_model(model, local_dir)
copy_tree(local_dir, model_dir)
# Save test data
X_test.to_csv(Path(test_data_dir) / "X_test.csv", index=False)
y_test.to_csv(Path(test_data_dir) / "y_test.csv", index=False)
def train_model(hyperparameters, X_train, y_train):
"""Train the model with your chosen framework here"""
# Model architecture
model = Pipeline(
steps=[
("scaler", StandardScaler()),
("estimator", SVC()),
]
)
# Set hyperparameters for training run
model.set_params(**hyperparameters)
# Train model
model.fit(X_train, y_train)
return model
def parse_args():
"""Parse args and hyperparameters"""
parser = argparse.ArgumentParser()
# Parse mandatory args
parser.add_argument("data", help="Path to data for training", type=str)
parser.add_argument("outputs_model", help="Name of the model", type=str)
parser.add_argument("outputs_test_data", help="Path to data for testing", type=str)
# Parse hyperparameter args
parser.add_argument("c_value", help="Coeffiecient for the estimator", type=str)
parser.add_argument("kernel", help="Kernel for the estimator", type=str)
parser.add_argument("coef0", help="Coeffiecient for the estimator", type=str)
# Get args
args = parser.parse_args()
return args
def main(**kwargs):
"""Train the model(s)
Parameters
----------
kwargs : dict
Dictionary of all parsed arguments
"""
# Logging
# - Autologging works with (0.22.1 <= scikit-learn <= 1.1.3)
# - See: https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.autolog
mlflow.sklearn.autolog()
# Setup
X_train, X_test, y_train, y_test = get_data(kwargs.get("data"))
hyperparameters = get_hyperparameters(**kwargs)
# Train
model = train_model(hyperparameters, X_train, y_train)
# Output
save_outputs(model, kwargs.get("outputs_model"), X_test, y_test, kwargs.get("outputs_test_data"))
if __name__ == "__main__":
"""Entrypoint for training the model(s)"""
main(**vars(parse_args()))

Related

Email Classifier using Spacy , throwing the below error due to version issue when tried to implement BOW

I'm trying to Create the TextCategorizer with exclusive classes and "bow" architecture but its throwing the below error due to version issue and my python version is 3.8 ,also my spacy version is 3.2.3 , please some one help me in resolving this
######## Main method ########
def main():
# Load dataset
data = pd.read_csv(data_path, sep='\t')
observations = len(data.index)
# print("Dataset Size: {}".format(observations))
# Create an empty spacy model
nlp = spacy.blank("en")
# Create the TextCategorizer with exclusive classes and "bow" architecture
text_cat = nlp.create_pipe(
"textcat",
config={
"exclusive_classes": True,
"architecture": "bow"})
# Adding the TextCategorizer to the created empty model
nlp.add_pipe(text_cat)
# Add labels to text classifier
text_cat.add_label("ham")
text_cat.add_label("spam")
# Split data into train and test datasets
x_train, x_test, y_train, y_test = train_test_split(
data['text'], data['label'], test_size=0.33, random_state=7)
# Create the train and test data for the spacy model
train_lables = [{'cats': {'ham': label == 'ham',
'spam': label == 'spam'}} for label in y_train]
test_lables = [{'cats': {'ham': label == 'ham',
'spam': label == 'spam'}} for label in y_test]
# Spacy model data
train_data = list(zip(x_train, train_lables))
test_data = list(zip(x_test, test_lables))
# Model configurations
optimizer = nlp.begin_training()
batch_size = 5
epochs = 10
# Training the model
train_model(nlp, train_data, optimizer, batch_size, epochs)
# Sample predictions
# print(train_data[0])
# sample_test = nlp(train_data[0][0])
# print(sample_test.cats)
# Train and test accuracy
train_predictions = get_predictions(nlp, x_train)
test_predictions = get_predictions(nlp, x_test)
train_accuracy = accuracy_score(y_train, train_predictions)
test_accuracy = accuracy_score(y_test, test_predictions)
print("Train accuracy: {}".format(train_accuracy))
print("Test accuracy: {}".format(test_accuracy))
# Creating the confusion matrix graphs
cf_train_matrix = confusion_matrix(y_train, train_predictions)
plt.figure(figsize=(10,8))
sns.heatmap(cf_train_matrix, annot=True, fmt='d')
cf_test_matrix = confusion_matrix(y_test, test_predictions)
plt.figure(figsize=(10,8))
sns.heatmap(cf_test_matrix, annot=True, fmt='d')
if __name__ == "__main__":
main()
Below is the error
---------------------------------------------------------------------------
ConfigValidationError Traceback (most recent call last)
<ipython-input-6-a77bb5692b25> in <module>
72
73 if __name__ == "__main__":
---> 74 main()
<ipython-input-6-a77bb5692b25> in main()
12
13 # Create the TextCategorizer with exclusive classes and "bow" architecture
---> 14 text_cat = nlp.add_pipe(
15 "textcat",
16 config={
~\anaconda3\lib\site-packages\spacy\language.py in add_pipe(self, factory_name, name, before, after, first, last, source, config, raw_config, validate)
790 lang_code=self.lang,
791 )
--> 792 pipe_component = self.create_pipe(
793 factory_name,
794 name=name,
~\anaconda3\lib\site-packages\spacy\language.py in create_pipe(self, factory_name, name, config, raw_config, validate)
672 # We're calling the internal _fill here to avoid constructing the
673 # registered functions twice
--> 674 resolved = registry.resolve(cfg, validate=validate)
675 filled = registry.fill({"cfg": cfg[factory_name]}, validate=validate)["cfg"]
676 filled = Config(filled)
~\anaconda3\lib\site-packages\thinc\config.py in resolve(cls, config, schema, overrides, validate)
727 validate: bool = True,
728 ) -> Dict[str, Any]:
--> 729 resolved, _ = cls._make(
730 config, schema=schema, overrides=overrides, validate=validate, resolve=True
731 )
~\anaconda3\lib\site-packages\thinc\config.py in _make(cls, config, schema, overrides, resolve, validate)
776 if not is_interpolated:
777 config = Config(orig_config).interpolate()
--> 778 filled, _, resolved = cls._fill(
779 config, schema, validate=validate, overrides=overrides, resolve=resolve
780 )
~\anaconda3\lib\site-packages\thinc\config.py in _fill(cls, config, schema, validate, resolve, parent, overrides)
831 schema.__fields__[key] = copy_model_field(field, Any)
832 promise_schema = cls.make_promise_schema(value, resolve=resolve)
--> 833 filled[key], validation[v_key], final[key] = cls._fill(
834 value,
835 promise_schema,
~\anaconda3\lib\site-packages\thinc\config.py in _fill(cls, config, schema, validate, resolve, parent, overrides)
897 result = schema.parse_obj(validation)
898 except ValidationError as e:
--> 899 raise ConfigValidationError(
900 config=config, errors=e.errors(), parent=parent
901 ) from None
ConfigValidationError:
Config validation error
textcat -> architecture extra fields not permitted
textcat -> exclusive_classes extra fields not permitted
{'nlp': <spacy.lang.en.English object at 0x000001B90CD4BF70>, 'name': 'textcat', 'architecture': 'bow', 'exclusive_classes': True, 'model': {'#architectures': 'spacy.TextCatEnsemble.v2', 'linear_model': {'#architectures': 'spacy.TextCatBOW.v2', 'exclusive_classes': True, 'ngram_size': 1, 'no_output_layer': False}, 'tok2vec': {'#architectures': 'spacy.Tok2Vec.v2', 'embed': {'#architectures': 'spacy.MultiHashEmbed.v2', 'width': 64, 'rows': [2000, 2000, 1000, 1000, 1000, 1000], 'attrs': ['ORTH', 'LOWER', 'PREFIX', 'SUFFIX', 'SHAPE', 'ID'], 'include_static_vectors': False}, 'encode': {'#architectures': 'spacy.MaxoutWindowEncoder.v2', 'width': 64, 'window_size': 1, 'maxout_pieces': 3, 'depth': 2}}}, 'scorer': {'#scorers': 'spacy.textcat_scorer.v1'}, 'threshold': 0.5, '#factories': 'textcat'}
My Spacy-Version
print(spacy.__version__)
3.2.3
My Python Version
import sys
print(sys.version)
3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]
Tring to downgrade the Spacy-Version
!conda install -c conda-forge spacy = 2.1.8
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working...
Building graph of deps: 0%| | 0/5 [00:00<?, ?it/s]
Examining spacy=2.1.8: 0%| | 0/5 [00:00<?, ?it/s]
Examining python=3.8: 20%|## | 1/5 [00:00<00:00, 4.80it/s]
Examining python=3.8: 40%|#### | 2/5 [00:00<00:00, 9.60it/s]
Examining #/win-64::__cuda==11.6=0: 40%|#### | 2/5 [00:01<00:00, 9.60it/s]
Examining #/win-64::__cuda==11.6=0: 60%|###### | 3/5 [00:01<00:01, 1.97it/s]
Examining #/win-64::__win==0=0: 60%|###### | 3/5 [00:01<00:01, 1.97it/s]
Examining #/win-64::__archspec==1=x86_64: 80%|######## | 4/5 [00:01<00:00, 1.97it/s]
Determining conflicts: 0%| | 0/5 [00:00<?, ?it/s]
Examining conflict for spacy python: 0%| | 0/5 [00:00<?, ?it/s]
UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:
Specifications:
- spacy=2.1.8 -> python[version='>=3.6,<3.7.0a0|>=3.7,<3.8.0a0']
Your python: python=3.8
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.
Please feel free to comment or ask .
Thank you
Just from the way I would understand that error message it tells you that the spacy version you want to install (2.1.8) is incompatible with the python version you have (3.8.8). It needs Python 3.6 or 3.7.
So either create an environment with Python 3.6 or 3.7 (its quite easy to specify Python version when creating a new environment in conda) or use a higher version of spacy. Did you already try if the code works if you just use the newest version of spacy?
Is there a specific reason for why you are using this spacy version? If you are using some methods that are not supported anymore it might make more sense to update your code to the newer spacy methods. Especially if you are doing this to learn about spacy it is counterproductive to learn methods that are not supported anymore. Sadly a lot of tutorials fail to either update their code or at least specify what versions they are using and then leave their code online for years.

How to set the label names when using the Huggingface TextClassificationPipeline?

I am using a fine-tuned Huggingface model (on my company data) with the TextClassificationPipeline to make class predictions. Now the labels that this Pipeline predicts defaults to LABEL_0, LABEL_1 and so on. Is there a way to supply the label mappings to the TextClassificationPipeline object so that the output may reflect the same?
Env:
tensorflow==2.3.1
transformers==4.3.2
Sample Code:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # or any {'0', '1', '2'}
from transformers import TextClassificationPipeline, TFAutoModelForSequenceClassification, AutoTokenizer
MODEL_DIR = "path\to\my\fine-tuned\model"
# Feature extraction pipeline
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR)
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
pipeline = TextClassificationPipeline(model=model,
tokenizer=tokenizer,
framework='tf',
device=0)
result = pipeline("It was a good watch. But a little boring.")[0]
Output:
In [2]: result
Out[2]: {'label': 'LABEL_1', 'score': 0.8864616751670837}
The simplest way is to add such a mapping is to edit the config.json of the model to contain: id2label field as below:
{
"_name_or_path": "distilbert-base-uncased",
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"id2label": [
"negative",
"positive"
],
"attention_dropout": 0.1,
.
.
}
A in-code way to set this mapping is by adding the id2label param in the from_pretrained call as below:
model = TFAutoModelForSequenceClassification.from_pretrained(MODEL_DIR, id2label={0: 'negative', 1: 'positive'})
Here is the Github Issue I raised for this to get added into the Documentation of transformers.XForSequenceClassification.

Run.get_context() gives the same run id

I am submitting the training through a script file. Following is the content of the train.py script. Azure ML is treating all these as one run (instead of run per alpha value as coded below) as Run.get_context() is returning the same Run id.
train.py
from azureml.opendatasets import Diabetes
from azureml.core import Run
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.externals import joblib
import math
import os
import logging
# Load dataset
dataset = Diabetes.get_tabular_dataset()
print(dataset.take(1))
df = dataset.to_pandas_dataframe()
df.describe()
# Split X (independent variables) & Y (target variable)
x_df = df.dropna() # Remove rows that have missing values
y_df = x_df.pop("Y") # Y is the label/target variable
x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=66)
print('Original dataset size:', df.size)
print("Size after dropping 'na':", x_df.size)
print("Training split size: ", x_train.size)
print("Test split size: ", x_test.size)
# Training
alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] # Define hyperparameters
# Create and log interactive runs
output_dir = os.path.join(os.getcwd(), 'outputs')
for hyperparam_alpha in alphas:
# Get the experiment run context
run = Run.get_context()
print("Started run: ", run.id)
run.log("train_split_size", x_train.size)
run.log("test_split_size", x_train.size)
run.log("alpha_value", hyperparam_alpha)
# Train
print("Train ...")
model = Ridge(hyperparam_alpha)
model.fit(X = x_train, y = y_train)
# Predict
print("Predict ...")
y_pred = model.predict(X = x_test)
# Calculate & log error
rmse = math.sqrt(mean_squared_error(y_true = y_test, y_pred = y_pred))
run.log("rmse", rmse)
print("rmse", rmse)
# Serialize the model to local directory
if not os.path.isdir(output_dir):
os.makedirs(output_dir, exist_ok=True)
print("Save model ...")
model_name = "model_alpha_" + str(hyperparam_alpha) + ".pkl" # Pickle file
file_path = os.path.join(output_dir, model_name)
joblib.dump(value = model, filename = file_path)
# Upload the model
run.upload_file(name = model_name, path_or_stream = file_path)
# Complete the run
run.complete()
Experiments view
Authoring code (i.e. control plane)
import os
from azureml.core import Workspace, Experiment, RunConfiguration, ScriptRunConfig, VERSION, Run
ws = Workspace.from_config()
exp = Experiment(workspace = ws, name = "diabetes-local-script-file")
# Create new run config obj
run_local_config = RunConfiguration()
# This means that when we run locally, all dependencies are already provided.
run_local_config.environment.python.user_managed_dependencies = True
# Create new script config
script_run_cfg = ScriptRunConfig(
source_directory = os.path.join(os.getcwd(), 'code'),
script = 'train.py',
run_config = run_local_config)
run = exp.submit(script_run_cfg)
run.wait_for_completion(show_output=True)
Short Answer
Option 1: create child runs within run
run = Run.get_context() assigns the run object of the run that you're currently in to run. So in every iteration of the hyperparameter search, you're logging to the same run. To solve this, you need to create child (or sub-) runs for each hyperparameter value. You can do this with run.child_run(). Below is the template for making this happen.
run = Run.get_context()
for hyperparam_alpha in alphas:
# Get the experiment run context
run_child = run.child_run()
print("Started run: ", run_child.id)
run_child.log("train_split_size", x_train.size)
On the diabetes-local-script-file Experiment page, you can see that Run 9 was the parent run and Runs 10-19 were the child runs if you click "Include child runs" page. There is also a "Child runs" tab on Run 9 details page.
Long answer
I highly recommend abstracting the hyperparameter search away from the data plane (i.e. train.py) and into the control plane (i.e. "authoring code"). This becomes especially valuable as training time increases and you can arbitrarily parallelize and also choose Hyperparameters more intelligently by using Azure ML's Hyperdrive.
Option 2 Create runs from control plane
Remove the loop from your code, add the code like below (full data and control here)
import argparse
from pprint import pprint
parser = argparse.ArgumentParser()
parser.add_argument('--alpha', type=float, default=0.5)
args = parser.parse_args()
print("all args:")
pprint(vars(args))
# use the variable like this
model = Ridge(args.alpha)
below is how to submit a single run using a script argument. To submit multiple runs, just use a loop in the control plane.
alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] # Define hyperparameters
list_rcs = [ScriptRunConfig(
source_directory = os.path.join(os.getcwd(), 'code'),
script = 'train.py',
arguments=['--alpha',a],
run_config = run_local_config) for a in alphas]
list_runs = [exp.submit(rc) for rc in list_rcs]
Option 3 Hyperdrive (IMHO the recommended approach)
In this way you outsource the hyperparameter source to Hyperdrive. The UI will also report results exactly how you want them, and via the API you can easily download the best model. Note you can't use this locally anymore and must use AMLCompute, but to me it is a worthwhile trade-off.This is a great overview. Excerpt below (full code here)
param_sampling = GridParameterSampling( {
"alpha": choice(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0)
}
)
estimator = Estimator(
source_directory = os.path.join(os.getcwd(), 'code'),
entry_script = 'train.py',
compute_target=cpu_cluster,
environment_definition=Environment.get(workspace=ws, name="AzureML-Tutorial")
)
hyperdrive_run_config = HyperDriveConfig(estimator=estimator,
hyperparameter_sampling=param_sampling,
policy=None,
primary_metric_name="rmse",
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
max_total_runs=10,
max_concurrent_runs=4)
run = exp.submit(hyperdrive_run_config)
run.wait_for_completion(show_output=True)

Azure ML Error: You must provide an InferenceConfig when deploying a model with model_framework set to AutoML

I am trying to deploy an Azure ML AutoML generated model with an ML Notebook (script is shortened for brevity):
automl_settings = {
"experiment_timeout_minutes": 20,
"primary_metric": 'AUC_weighted',
"max_concurrent_iterations": 8,
"max_cores_per_iteration": -1,
"enable_dnn": False,
"enable_early_stopping": True,
"validation_size": 0.3,
"verbosity": logging.INFO,
}
automl_config = AutoMLConfig(task = 'classification',
debug_log = 'automl_errors.log',
compute_target=compute_target,
blacklist_models=['LogisticRegression','MultinomialNaiveBayes','BernoulliNaiveBayes','LinearSVM','DecisionTree','RandomForest','ExtremeRandomTrees','LightGBM','KNN','SVM','StackEnsemble','VotingEnsemble'],
training_data=train_dataset,
label_column_name=target_column_name,
**automl_settings
)
automl_run = experiment.submit(automl_config, show_output=True)
best_run, fitted_model = automl_run.get_output()
best_run_metrics = best_run.get_metrics()
children = list(automl_run.get_children(recursive=True))
summary_df = pd.DataFrame(index=['run_id', 'run_algorithm',
'primary_metric', 'Score'])
goal_minimize = False
for run in children:
if('run_algorithm' in run.properties and 'score' in run.properties):
summary_df[run.id] = [run.id, run.properties['run_algorithm'],
run.properties['primary_metric'],
float(run.properties['score'])]
if('goal' in run.properties):
goal_minimize = run.properties['goal'].split('_')[-1] == 'min'
summary_df = summary_df.T.sort_values(
'Score',
ascending=goal_minimize).drop_duplicates(['run_algorithm'])
summary_df = summary_df.set_index('run_algorithm')
best_dnn_run_id = summary_df['run_id'].iloc[0]
best_dnn_run = Run(experiment, best_dnn_run_id)
model_dir = 'Model' # Local folder where the model will be stored temporarily
if not os.path.isdir(model_dir):
os.mkdir(model_dir)
best_run.download_file('outputs/model.pkl', model_dir + '/model.pkl')
# Register the model
model_name = best_run.properties['model_name']
model_path=os.path.join("./outputs",'model.pkl')
description = 'My Model'
model = best_run.register_model(model_name=model_name, model_path=model_path, model_framework='AutoML', description = description, tags={'env': 'sandbox'})
# Deploy the Model
service_name = 'my-ml-service'
service = Model.deploy(ws, service_name, [model], overwrite=True)
service.wait_for_deployment(show_output=True)
Everything appears to run fine until I try to deploy the model:
--------------------------------------------------------------------------- UserErrorException Traceback (most recent call last) <ipython-input-48-5c72d1613c28> in <module>
3 service_name = 'my-service'
4
----> 5 service = Model.deploy(ws, service_name, [model], overwrite=True)
6
7
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/model.py in deploy(workspace, name, models, inference_config, deployment_config, deployment_target, overwrite) 1577 logger=module_logger) 1578
-> 1579 return Model._deploy_no_code(workspace, name, models, deployment_config, deployment_target, overwrite) 1580 1581 # Environment-based webservice.
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/model.py in _deploy_no_code(workspace, name, models, deployment_config, deployment_target, overwrite) 1795 :rtype: azureml.core.Webservice 1796 """
-> 1797 environment_image_request = build_and_validate_no_code_environment_image_request(models) 1798 1799 return Model._deploy_with_environment_image_request(workspace, name, environment_image_request,
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/_model_management/_util.py in build_and_validate_no_code_environment_image_request(models) 1180 raise UserErrorException('You must provide an InferenceConfig when deploying a model with model_framework ' 1181 'set to {}. Default environments are only provided for these frameworks: {}.'
-> 1182 .format(model.model_framework, Model._SUPPORTED_FRAMEWORKS_FOR_NO_CODE_DEPLOY)) 1183 1184
# Only specify the model IDs; MMS will provide the environment, driver program, etc.
UserErrorException: UserErrorException: Message: You must provide an InferenceConfig when deploying a model with model_framework set to AutoML. Default environments are only provided for these frameworks: ['Onnx', 'ScikitLearn', 'TensorFlow']. InnerException None ErrorResponse {
"error": {
"code": "UserError",
"message": "You must provide an InferenceConfig when deploying a model with model_framework set to AutoML. Default environments are only provided for these frameworks: ['Onnx', 'ScikitLearn', 'TensorFlow']."
}
When deploying an AutoML generated model from the Azure Machine Learning Studio, I am not prompted to provide an entry script or dependencies file (or an InferenceConfig). Is there a way to configure this with the Python SDK so that I can "no code deploy" an AutoML generated model? Is there something wrong in my code? Hope you can help.
I don't think you can rely on "no code" deployment in your scenario, because AutoML may find the best solution is from a framework that is not yet supported by "no code" deployment.
If it helps, you can create the InferenceConfig from your Run:
environment = best_run.get_context().get_environment()
inference_config = InferenceConfig(entry_script='score.py', environment=environment)

Tensor Flow Logistic Regression classifier hanging

I am about 4 weeks into the whole python, machine learning area.
I have written something using LinearClassifier in tensor flow using lending clubs data.
However, when I run the script it hangs at some point.
Any experienced persons help would be appreciated. Here is a copy of the script.
""" Collect and load the data """
import os
import tarfile
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from six.moves import urllib
import tensorflow as tf
from sklearn.preprocessing import LabelBinarizer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import Imputer
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
HOME_PATH = os.getcwd()
""" load the csv file with the lending data and convert to tensors """
def convert_duration(s):
try:
if pd.isnull(s):
return s
elif s[0] == '<':
return 0.0
elif s[:2] == '10':
return 10.0
else:
return np.float(s[0])
except TypeError:
return np.float64(s)
def load_data(file_name):
csv_path = os.path.join(HOME_PATH, file_name)
csv_data = pd.read_csv(csv_path, encoding = "ISO-8859-1", dtype={'desc': np.str, 'verification_status_joint': np.str, 'loan_status': np.str})
loans = csv_data.loc[csv_data['loan_status'].isin(['Fully Paid', 'Charged Off'])] # Sort out only fully Paid (Paid) and Charged Off (Default)
loans['loan_status'] = loans['loan_status'].apply(lambda s: np.float(s == 'Fully Paid')) # Convert to boolean integer
# Drop Columns with one distinct data field
for col in loans.columns:
if loans[col].nunique() == 1:
del loans[col]
for col in loans.columns:
if (loans[col].notnull().sum() / len(loans.index)) < 0.1 :
del loans[col]
# Remove all irrelevant columns & hifg prediction columns based on pure descetion
loans.drop(labels=['id', 'member_id', 'grade', 'sub_grade', 'last_credit_pull_d', 'emp_title', 'url', 'desc', 'title', 'issue_d', 'earliest_cr_line', 'last_pymnt_d','addr_state'], axis=1, inplace=True)
# Process the text based variables
# Term
loans['term'] = loans['term'].apply(lambda s:np.float(s[1:3]))
loans['emp_length'] = loans['emp_length'].apply(lambda s: convert_duration(s))
#change zip code to just the first 3 significant digits
loans['zip_code'] = loans['zip_code'].apply(lambda s:np.float(s[:3]))
loans.fillna(0,inplace=True)
loan_data = shuffle(loans)
X = loan_data.drop(labels=['loan_status'], axis=1)
Y = loan_data['loan_status']
## consider processing tensorflow feature columns here and return as one response and standardise at one
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
# scaler = StandardScaler()
# X_train = scaler.fit_transform(X_train)
# X_test = scaler.fit_transform(X_test)
return (X_train, Y_train), (X_test, Y_test)
def my_input_fn(features, labels, batch_size , shuffle=True):
# consider changing categorical columns and all
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
dataset = dataset.shuffle(buffer_size=1000).repeat(count=None).batch(batch_size)
return dataset.make_one_shot_iterator().get_next()
#Start on calls to make data available
(X_train, Y_train), (X_test, Y_test) = load_data("loan_data.csv")
my_feature_columns = []
numerical_columns = ['loan_amnt',
'funded_amnt',
'funded_amnt_inv',
'int_rate',
'installment',
'annual_inc',
'dti',
'delinq_2yrs',
'inq_last_6mths',
'mths_since_last_delinq',
'mths_since_last_record',
'open_acc',
'pub_rec',
'revol_bal',
'revol_util',
'total_acc',
'total_pymnt',
'total_pymnt_inv',
'total_rec_prncp',
'total_rec_int',
'total_rec_late_fee',
'recoveries',
'collection_recovery_fee',
'last_pymnt_amnt',
'collections_12_mths_ex_med',
'mths_since_last_major_derog',
'acc_now_delinq',
'tot_coll_amt',
'tot_cur_bal',
'total_rev_hi_lim']
categorical_columns = ['home_ownership',
'verification_status',
'pymnt_plan',
'purpose',
'initial_list_status',
'application_type']
for key in numerical_columns:
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
for key in categorical_columns:
my_feature_columns.append(tf.feature_column.categorical_column_with_hash_bucket(key=key, hash_bucket_size = 10))
classifier = tf.estimator.LinearClassifier(
feature_columns=my_feature_columns
)
classifier.train(
input_fn=lambda:my_input_fn(X_train, Y_train, 100),
steps=100
)
eval_result = classifier.evaluate(
input_fn=lambda:my_input_fn(X_test, Y_test, 100)
)
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
Here is a sample of the output in the console before it hangs;
43: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
loans['loan_status'] = loans['loan_status'].apply(lambda s: np.float(s == 'Fully Paid')) # Convert to boolean integer
/Users/acacia/Desktop/work/machine_learning/tensor_flow/logistic_regression.py:53: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
loans.drop(labels=['id', 'member_id', 'grade', 'sub_grade', 'last_credit_pull_d', 'emp_title', 'url', 'desc', 'title', 'issue_d', 'earliest_cr_line', 'last_pymnt_d','addr_state'], axis=1, inplace=True)
/Users/acacia/Desktop/work/machine_learning/tensor_flow/logistic_regression.py:57: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
loans['term'] = loans['term'].apply(lambda s:np.float(s[1:3]))
/Users/acacia/Desktop/work/machine_learning/tensor_flow/logistic_regression.py:59: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
loans['emp_length'] = loans['emp_length'].apply(lambda s: convert_duration(s))
/Users/acacia/Desktop/work/machine_learning/tensor_flow/logistic_regression.py:62: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
loans['zip_code'] = loans['zip_code'].apply(lambda s:np.float(s[:3]))
/Users/acacia/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py:3035: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
downcast=downcast, **kwargs)
INFO:tensorflow:Using default config.
WARNING:tensorflow:Using temporary folder as model directory: /var/folders/2t/bhtmq3ln5mb6mv26w6pfbq_m0000gn/T/tmpictbxp6x
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/2t/bhtmq3ln5mb6mv26w6pfbq_m0000gn/T/tmpictbxp6x', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1a205d6358>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /var/folders/2t/bhtmq3ln5mb6mv26w6pfbq_m0000gn/T/tmpictbxp6x/model.ckpt.
INFO:tensorflow:loss = 69.31472, step = 1
INFO:tensorflow:Saving checkpoints for 100 into /var/folders/2t/bhtmq3ln5mb6mv26w6pfbq_m0000gn/T/tmpictbxp6x/model.ckpt.
INFO:tensorflow:Loss for final step: 0.0.
INFO:tensorflow:Calling model_fn.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
WARNING:tensorflow:Trapezoidal rule is known to produce incorrect PR-AUCs; please switch to "careful_interpolation" instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-05-07-10:55:12
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/2t/bhtmq3ln5mb6mv26w6pfbq_m0000gn/T/tmpictbxp6x/model.ckpt-100
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.

Resources