Azure ML model deployment fail: Module not found error - azure

I'm trying to deploy a model locally using Azure ML before deploying to AKS. I have a custom script that I want to import into my entry script (scoring script), but it's saying it is not found.
Here is the error:
Here's my entry script with the custom script import on line 1:
import rake_refactored as rake
from operator import itemgetter
import pandas as pd
import datetime
import re
import operator
import numpy as np
import json
# Called when the deployed service starts
def init():
global stopword_path
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
# For multiple models, it points to the folder containing all deployed models (./azureml-models)
stopword_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'models/SmartStoplist.txt')
# load models
def preprocess(df):
df = rake.prepare_data(df)
text = rake.process_response(df, "RESPNS")
return text
# Use model to make predictions
def predict(df):
text = preprocess(df)
return rake.extract_keywords(stopword_path, text)
def run(data):
try:
# Find the data property of the JSON request
df = pd.read_json(json.loads(data))
prediction = predict(df)
return json.dump(prediction)
except Exception as e:
return str(e)
And here is my model artifact directory in Azure ML showing that it is in the same directory as the entry script (rake_score.py).
What am I doing wrong? I had a similar issue before with a sklearn package that I was able to add to the pip-package list when I built the environment, but my custom script isn't a pip package.

Not able to find rake_refactored in documentation and on the internet.
You can try below steps for importing rake.
Using pip
pip install rake-nltk
Directly from the repository
git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install
Sample Code:
from rake_nltk import Rake
# Uses stopwords for english from NLTK, and all puntuation characters by
# default
r = Rake()
# Extraction given the text.
r.extract_keywords_from_text(<text to process>)
# Extraction given the list of strings where each string is a sentence.
r.extract_keywords_from_sentences(<list of sentences>)
# To get keyword phrases ranked highest to lowest.
r.get_ranked_phrases()
# To get keyword phrases ranked highest to lowest with scores.
r.get_ranked_phrases_with_scores()
Refer - https://github.com/csurfer/rake-nltk

In order to access my custom script in my scoring script I needed to explicitly define the source directory in my inference configuration:
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(
environment = env,
entry_script = "rake_score.py",
source_directory='./models'
)

Related

catalogue.RegistryError: [E893] Could not find function 'Custom_Candidate_Gen.v1' in function registry 'misc'

I am currently building a spacy pipeline with custom NER,Entity Linker and Textcat components. For my Entity Linker component, I have modified the candidate_generator() to suit my use-case. I have used the ner_emersons demo project for reference. Following is my custom_functions code.
import spacy
from functools import partial
from pathlib import Path
from typing import Iterable, Callable
from spacy.training import Example
from spacy.tokens import DocBin
from spacy.kb import Candidate, KnowledgeBase, get_candidates
#spacy.registry.misc("Custom_Candidate_Gen.v1")
def create_candidates():
return custom_get_candidates
def custom_get_candidates(kb, span):
return kb.get_alias_candidates(span.text.lower())
#spacy.registry.readers("MyCorpus.v1")
def create_docbin_reader(file: Path) -> Callable[["Language"], Iterable[Example]]:
return partial(read_files, file)
def read_files(file: Path, nlp: "Language") -> Iterable[Example]:
# we run the full pipeline and not just nlp.make_doc to ensure we have entities and sentences
# which are needed during training of the entity linker
with nlp.select_pipes(disable="entity_linker"):
doc_bin = DocBin().from_disk(file)
docs = doc_bin.get_docs(nlp.vocab)
for doc in docs:
yield Example(nlp(doc.text), doc)
After training my entity linker and adding my textcat component to the pipeline, I am getting the following error:
catalogue.RegistryError: [E893] Could not find function 'Custom_Candidate_Gen.v1' in function registry 'misc'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.
Available names: spacy.CandidateGenerator.v1, spacy.EmptyKB.v1, spacy.KBFromFile.v1, spacy.LookupsDataLoader.v1, spacy.ngram_range_suggester.v1, spacy.ngram_suggester.v1
Why isn't my custom Candidate Generator getting registered?
Your options for having custom code loaded and registered when you load a model:
import this code directly in your script before loading the model
package it with your model with spacy package --code and load the model from the installed package name (rather than the directory)
provide this code in a separate package that uses entry points in setup.cfg to register the methods (which works fine, but wouldn't be my first choice in this situation)
See:

Pyodbc query works in a functional format, but yields Windows Fatal Exception when implemented in an Object Oriented manner

I am querying multiple SQL databases and combining them into a single table. For the purposes of this discussion I am using pseudo-code to describe the queries, because I do not want to give information publicly about the database. However, the query is not the issue so it should not be a problem for this example.
To first ensure that the code works correctly, I implemented a functional version of it in a Jupiter notebook. The code I used is shown below and it works just fine when the actual query is inserted.
import pyodbc
import pandas as pd
# This code works in Jupyter notebook using a Python 3.8
conn = pyodbc.connect("Driver={XXXX};"
"Server=XXXX;"
"uid=XXXX;"
"pwd=XXXX")
df = pd.read_sql_query("query goes here", conn)
My next step was to implement the code in actual .py files as object oriented code, which is shown below.
class OpenDB:
def __init__(self):
# - will pass variables in class instantiation once
# I get the classes working correctly
self.conn = pyodbc.connect("Driver={XXXX};"
"Server=XXXX;"
"uid=XXXX;"
"pwd=XXXX")
self.cur = self.conn.cursor()
# ----------------------------------------------------------------------------
def close_database_connection(self) -> None:
self.conn.close()
return
# ============================================================================
# ============================================================================
class ReadDB(OpenDB):
def __init__(self):
OpenDB.__init__(self)
# ----------------------------------------------------------------------------
def read_data(self) -> pd.DataFrame:
df = pd.read_sql_query("query goes here", self.conn)
# - The code fails after the above line and never gets to
# the line below this
self.close_database_connection()
return df
# ============================================================================
# ============================================================================
db = ReadDB()
df = db.read_data()
The object oriented code written above appears to be the exact same code used in the Jupiter notebook; however, when I run it as shown above, the query fails and returns the error Windows fatal exception: access violation error. The queries are identical between the two implementations, and the information passed to establish the database connection is identical. Can anyone see why the Jupiter notebook implementation works and the object oriented version does not? Both are using Python 3.8.

How to Change Model Pocket Sphinx

I have set up pocket sphinx in linux and am trying to generate custom language model. I tried to generate my custom language model using this link: http://www.speech.cs.cmu.edu/tools/lmtool-new.html
The code I used to work with provided model works fine but when feeding my custom model I get the following error:
_pocketsphinx.new_Decoder(*args) RuntimeError: new_Decoder returned -1
The used sample code is as follows:
import os
from os import environ, path
from pocketsphinx import LiveSpeech
from sphinxbase import *
pocketsphinx_dir = os.path.dirname(__file__)
print(pocketsphinx_dir)
MODELDIR = "./myModel/model"
MODELDIR1 = "./myModel"
speech = LiveSpeech(
verbose=False,
sampling_rate=16000,
buffer_size=2048,
no_search=False,
full_utt=False,
hmm=os.path.join(MODELDIR, 'en-us/en-us'),
lm=os.path.join(MODELDIR1, '2506.lm'),
dic=os.path.join(MODELDIR1, '2506.dic')
)
for phrase in speech: print(phrase)
Also note that I already tried using the absolute path as suggested in this answer but that did not helped my case.

Best way of pulling the data from bitbucket repository using python code

I have to develop a functionality wherein I have to pull the files from bitbucket repository using python code on linux server. Files are located in bitbucket repository itself
Can you suggest me how to do that and best way of doing that. I tried with APIs- http:///rest/api/1.0/projects//repos//browse - It gave me components level data i.e only the files name, but not the actual files content
Thanks
There is a python library which wraps around the rest api:
https://github.com/cosmin/stashy
Or you can use urllib2:
#!/usr/bin/python
import os
import tempfile
import sys
import urllib2
import json
import base64
import logging
import re
import pprint
import requests
import subprocess
projectKey= "FW"
repoKey = "fw"
branch = "master"
pathToVersionProperties = "core/CruiseControl/CI_version.properties"
localVersionProperties = "CI_version.properties"
bitbucketBaseUrl = "https://bitbucket.company.com/rest/api/latest"
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
def checkPersonalAccessToken():
try:
os.environ["PAT"]
logging.info("Detected Personal Access Token")
except KeyError:
logging.error("Personal Access Token: $PAT env variable not set, update Jenkins master with correct environment variable")
sys.exit(1)
def getJenkinsPropertiesFile():
restEndpoint = "{}/projects/{}/repos/{}/raw/{}".format(bitbucketBaseUrl, projectKey, repoKey, pathToVersionProperties)
logging.info("REST endpoint : {}".format(restEndpoint))
request = urllib2.Request(restEndpoint)
request.add_header("Authorization", "Bearer %s" % os.environ["PAT"])
result = urllib2.urlopen(request).read()
return result
checkPersonalAccessToken()
propertiesString = getJenkinsPropertiesFile()
This example retrieves a properties file from bitbucket. I am not sure what version of Bitbucket you are using. The example above uses Personal Access Tokens for authentication (added in Bitbucket 5.5). You could also use standard username/password.

Custom dependencies in the scikit-learn framework of Google Cloud ML Engine

I was searching for the possibility to add user defined functions and custom transformers in my ML project but I have found only examples how to do this in the Tensor-Flow framework.
I have created a custom package that can be installed with pip but I do not know how a setup.py file should look like in the scikit-learn framework.
I would be glad if you can give me some hints.
The pipeline that I am trying to deploy is given below:
from custscaler import StdScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
knn_pipe = Pipeline([
('my_std', StdScaler(5) ),
('my_knn',KNeighborsClassifier(n_neighbors=7))
])
model = knn_pipe.fit(X_train, Y_train)
The custom transformer:
/custscaler/__init__.py
from .fct1 import StdScaler
/custscaler/fct1.py
from sklearn import base
class StdScaler(base.BaseEstimator, base.TransformerMixin):
def __init__(self, scaling_factor):
self.scaling_factor = scaling_factor
def fit(self, X, y=None):
return self
def transform(self, X):
data = [ [el*self.scaling_factor for el in row] for row in X ]
return data
Packaging up dependencies is really the same regardless of framework. Although setup.py is a generic construct, some advice is given on the CloudML Engine's page (link)
In particular, this figure should be helpful:
In your case, the code snippet that does knn_pipe.fit would be inside of trainer and custscaler would be the "other_subpackage" in the figure.
The "magic bit" in setup.py is the line:
packages=find_packages()
which will include trainer and custscaler (assuming they have __init__.py).

Resources