Not able to read model.pkl from output folder in Azure ML - azure-machine-learning-service

I'm try to read the model.pkl file from the artifacts output folder like this
def init():
global model
# infile = open('model.pkl','rb')
# model = pickle.load(infile)
#model = joblib.load('model.pkl')
model_path = Model.get_model_path(model_name = '<<modelname>>')
model_path="outputs/model.pkl"
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
But still its not working please guide me how to read model.pkl file from artifacts output folder, because of this it is failing to deploy into Azure container instance

Related

How to save and access pickle/hdf5 files in azure machine learning studio

I have a pickle file parameters.pkl containing some parameters and their values of a model. The pickle file has been created through the following process:
dict={'scaler': scaler,
'features': z_tags,
'Z_reconstruction_loss': Z_reconstruction_loss}
pickle.dump(dict, open('parameters.pkl', 'wb'))
model_V2.hdf5
I am new to azure machine learning studio.It will be helpful to know, how the pickle file and hdf5 files can be stored in Azure machine Learning Studio and an API endpoint be created, so that the the pickle file can be accessed through API. Objective is to access the pickle file and its contents through API.. I have tried the following:
pip install azureml , azureml-core
from azureml.core import Workspace
from azureml.core.webservice import AciWebservice
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.conda_dependencies import CondaDependencies
ws = Workspace.create(
name='myworkspace',
subscription_id='<azure-subscription-id>',
resource_group='myresourcegroup',
create_resource_group=True,
location='eastus2'
)
ws.write_config()
ws = Workspace.from_config()
model = Model.register(workspace = ws,
model_path ="model/parameters.pkl",
model_name = "parameters",
tags = {"version": "1"},
description = "parameters",
)
# to install required packages
env = Environment('env')
cd = CondaDependencies.create(pip_packages=['pandas==1.1.5', 'azureml-defaults','joblib==0.17.0'], conda_packages = ['scikit-learn==0.23.2'])
env.python.conda_dependencies = cd
# Register environment to re-use later
env.register(workspace = ws)
print("Registered Environment")
myenv = Environment.get(workspace=ws, name="env")
myenv.save_to_directory('./environ', overwrite=True)
aciconfig = AciWebservice.deploy_configuration(
cpu_cores=1,
memory_gb=1,
tags={"data":"parameters"},
description='parameters MODEL',
)
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
What to modify in following score script, as I don't want to predict anything but access the parameter values stored in the pickle file.
def init():
global modelmodel_path = Model.get_model_path("parameters")
print("Model Path is ", model_path)
model = joblib.load(model_path)
def run(data):
try:
data = json.loads(data)
result = model.predict(data['data'])
return {'data' : result.tolist() , 'message' : "Successfully
accessed"}
except Exception as e:
error = str(e)
return {'data' : error , 'message' : 'Failed to access'}
Deploy the Model
service = Model.deploy(workspace=ws,
name='iris-model',
models=[model],
inference_config=inference_config,
deployment_config=aciconfig,
overwrite = True)
service.wait_for_deployment(show_output=True)
url = service.scoring_uri
print(url)
We need to deploy the model as the external service and as a web service application to azure machine learning studio.
Follow the below steps to reproduce the work to deploy the model as the web service application to perform API calls.
Note: Before proceeding with the steps, make the models pickle file handy
Open https://studio.azureml.net/
Click on New
Then click on upload from local files
Select the zip file to upload
Under experiments upload the dataset.
Use the following canvas steps to create the model
After mapping all the elements and also executing the python script. Click on set up web service
Click on deploy web service.

Tensor flow model import error as NewRandomFileAccess

I get this error when i try to load the model into keras as pb file. This is how file structure looks like
-SavedModel
-variables
variables.index
saved_model.pb
-SavedImgModel
-variables
variables.index
saved_model.pb
def load_model():
global model
global img_model
model_path = os.path.join('.','SavedModel\\',)
img_modelpath = os.path.join('.','SavedImgModel\\')
print(model_path)
print(("* Loading Keras model and Flask starting server...please wait until server has fully started"))
model = tf.keras.models.load_model(model_path)
img_model = tf.keras.models.load_model(img_modelpath)
# initialize our Flask application and the Keras model
app = Flask(__name__)
load_model()
detector = MTCNN(
Error as :
raise errors_impl.OpError(None, None, error_message, errors_impl.UNKNOWN)
tensorflow.python.framework.errors_impl.OpError: NewRandomAccessFile failed to Create/Open: .\SavedModel\variables\variables.data-00000-of-00001 : The system cannot find the file specified.
; No such file or director

FastText cannot be opened for loading from bucket

I downloaded fasttext model lid.176.bin. If I run my code locally with model in folder everything works fine. But I need to run it in GC, so, I uploaded model into bucket and changed path to model from local to gs bucket and got an error: ValueError: gs://models/fasttext-model/lid.176.bin cannot be opened for loading!
How can I use model from bucket?
path_to_pretrained_model = 'gs://models/fasttext-model/lid.176.bin'
fasttext_model = fasttext.load_model(path_to_pretrained_model)
This function from Google Documentation helps me to solve the problem
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
# Construct a client side representation of a blob.
# Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
# any content from Google Cloud Storage. As we don't need additional data,
# using `Bucket.blob` is preferred here.
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Blob {} downloaded to {}.".format(
source_blob_name, destination_file_name
)
)

Why does my Flask app work when executing using `python app.py` but not when using `heroku local web` or `flask run`?

I wrote a Flask-based web app that takes text from users and returns the probability that it is of a given classification (full script below). The app loads some of the trained models needed to make predictions before any requests are made. I am currently trying to deploy it on Heroku and experiencing some problems.
I am able to run it locally when I execute python ml_app.py. But when I use the Heroku CLI command heroku local web to try to run it locally to test before deployment, I get the following error
AttributeError: module '__main__' has no attribute 'tokenize'
This error is associated with the loading of a text vectorizer called TFIDF found in the line
tfidf_model = joblib.load('models/tfidf_vectorizer_train.pkl')
I have imported the required function at the top of the script to ensure that this is loaded properly (from utils import tokenize). This works given that I can run it when I use python ml_app.py. But for reasons I do not know, it doesn't load when I use heroku local web. It also doesn't work when I use the Flask CLI command flask run when trying to run it locally. Any idea why?
I admit that I do not have a good understanding of what is going on under the hood here (with respect to the web dev./deployment aspect of the code) so any explanation helps.
from flask import Flask, request, render_template
from sklearn.externals import joblib
from utils import tokenize # custom tokenizer required for tfidf model loaded in load_tfidf_model()
app = Flask(__name__)
models_directory = 'models'
#app.before_first_request
def nbsvm_models():
global tfidf_model
global logistic_identity_hate_model
global logistic_insult_model
global logistic_obscene_model
global logistic_severe_toxic_model
global logistic_threat_model
global logistic_toxic_model
tfidf_model = joblib.load('models/tfidf_vectorizer_train.pkl')
logistic_identity_hate_model = joblib.load('models/logistic_identity_hate.pkl')
logistic_insult_model = joblib.load('models/logistic_insult.pkl')
logistic_obscene_model = joblib.load('models/logistic_obscene.pkl')
logistic_severe_toxic_model = joblib.load('models/logistic_severe_toxic.pkl')
logistic_threat_model = joblib.load('models/logistic_threat.pkl')
logistic_toxic_model = joblib.load('models/logistic_toxic.pkl')
#app.route('/')
def my_form():
return render_template('main.html')
#app.route('/', methods=['POST'])
def my_form_post():
"""
Takes the comment submitted by the user, apply TFIDF trained vectorizer to it, predict using trained models
"""
text = request.form['text']
comment_term_doc = tfidf_model.transform([text])
dict_preds = {}
dict_preds['pred_identity_hate'] = logistic_identity_hate_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_insult'] = logistic_insult_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_obscene'] = logistic_obscene_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_severe_toxic'] = logistic_severe_toxic_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_threat'] = logistic_threat_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_toxic'] = logistic_toxic_model.predict_proba(comment_term_doc)[:, 1][0]
for k in dict_preds:
perc = dict_preds[k] * 100
dict_preds[k] = "{0:.2f}%".format(perc)
return render_template('main.html', text=text,
pred_identity_hate=dict_preds['pred_identity_hate'],
pred_insult=dict_preds['pred_insult'],
pred_obscene=dict_preds['pred_obscene'],
pred_severe_toxic=dict_preds['pred_severe_toxic'],
pred_threat=dict_preds['pred_threat'],
pred_toxic=dict_preds['pred_toxic'])
if __name__ == '__main__':
app.run(debug=True)
Fixed it. It was due to the way I picked the class instance stored in tfidf_vectorizer_train.pkl. The model was created in an ipython notebook where one of its attributes depended on a tokenizer function that I defined interactively in the notebook. I soon learned that pickling does not save the exact instance of a class, which means tfidf_vectorizer_train.pkl does not contain the function I defined in the notebook.
To fix this, I moved the tokenizer function to a separate utilities python file and imported the function in both the file where I trained and subsequently pickled the model and in the file where I unpickled it.
In code, I did
from utils import tokenize
...
tfidfvectorizer = TfidfVectorizer(ngram_range=(1, 2), tokenizer=tokenize,
min_df=3, max_df=0.9, strip_accents='unicode',
use_idf=1, smooth_idf=True, sublinear_tf=1)
train_term_doc = tfidfvectorizer.fit_transform(train[COMMENT])
joblib.dump(tfidfvectorizer, 'models/tfidf_vectorizer_train.pkl')
...
in the file where I trained the model and
from utils import tokenize
...
#app.before_first_request
def load_models():
# from utils import tokenize
global tfidf_model
tfidf_model =
joblib.load('{}/tfidf_vectorizer_train.pkl'.format(models_directory))
...
in the file containing the web app code.

How to save Spark model as a file

I'm testing out the document code at https://spark.apache.org/docs/1.6.2/mllib-ensembles.html#random-forests. For some reason, myRandomForestClassificationModel was saved as a directory. How do I save it as a file? I'm new to Spark, so I'm not sure if I did anything wrong in the code.
from pyspark import SparkContext
from pyspark.mllib.tree import RandomForest, RandomForestModel
from pyspark.mllib.util import MLUtils
sc = SparkContext(appName="rf")
# Load and parse the data file into an RDD of LabeledPoint.
data = MLUtils.loadLibSVMFile(sc, '/sample_libsvm_data.txt')
# Split the data into training and test sets (30% held out for testing)
(trainingData, testData) = data.randomSplit([0.7, 0.3])
# Train a RandomForest model.
# Empty categoricalFeaturesInfo indicates all features are continuous.
# Note: Use larger numTrees in practice.
# Setting featureSubsetStrategy="auto" lets the algorithm choose.
model = RandomForest.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo={},
numTrees=100, featureSubsetStrategy="auto",
impurity='gini', maxDepth=4, maxBins=32)
# Evaluate model on test instances and compute test error
predictions = model.predict(testData.map(lambda x: x.features))
labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions)
testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count() / float(testData.count())
print('Test Error = ' + str(testErr))
print('Learned classification forest model:')
print(model.toDebugString())
# Save and load model
model.save(sc, "/rf/myRandomForestClassificationModel")
sameModel = RandomForestModel.load(sc, "/rf/myRandomForestClassificationModel")
Nothing is wrong with your code. It is correct that models are saved as a directory, specifically there is a modeland metadata directory. This makes sense as Spark is a distributed system. It's like when you save data back to hdfs or s3 which happens in parallel, this is also done with the model.

Resources