Azure Machine Learning Studio -getting error in consumption of endpoint - python-3.x

I have deployed a machine learning model as a pickle file in azure machine learning. The endpoint is created. Now, I am trying to consume the endpoint through the following codes:
import requests
import numpy as np
# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test) - 1)
input_data = '{"data": [' + str(list(X_test[random_index])) + "]}"
headers = {"Content-Type": "application/json"}
resp = requests.post(service.scoring_uri, input_data, headers=headers)
print("POST to url", service.scoring_uri)
print("prediction:", resp.text)
It's giving error with following message:
prediction: {"data": "Expecting value: line 1 column 12 (char 11)", "message": "Failed to predict"}
The data looks like:
X_test => array([[[0. ], [0.274710], [0.403273]]])
'{"data": [' + str(list(X_test[random_index])) + "]}"
convert it to
'{"data": [[array([0.]), array([0.274710]), array([0.403273])]]}'

In the current code mentioned, it was mentioned as POST method. But to consume the endpoints, it was suggested to use the GET method.
import requests
import os
import base64
import json
personal_access_token = ":"+os.environ["AZ_ACCESS_TOKEN"]
headers = {}
headers['Content-type'] = "application/json"
headers['Authorization'] = b'Basic ' + base64.b64encode(personal_access_token.encode('utf-8'))
#Get a list of agent pools.
instance = "dev.azure.com/name"
propVals = "{name=Default,isHosted=false}"
api_version = "version _number”
uri = ("complete uri”)
r = requests.get(uri, headers=headers)
To get the complete URI, use the GET method with poolName embedded into the syntax
https://dev.azure.com/{organization}/_apis/distributedtask/pools?poolName={poolName}&api-version=5.1
If the case is like using the POST method itself, change the below line in the 6th line of the code.
input_data = '{"data": [“ + str(list(X_test[random_index])) + "]}’
The single and double quotations are misplaced.

Related

Azure ML Pipeline Glob patterns inside the path are not supported by the volume mount

I'm trying to run an Azure ML pipeline using the Azure ML Python SDKv2. The input to the pipeline is a Data asset whose Data source is the default blob store. Its path is azureml:raw_data_v2:1 and it is of type URI_FOLDER. I'm getting the following error when running the pipeline
[2022-11-03 16:10:29Z] Job failed, job RunId is
fca7d858-2b46-43bb-89e8-0481631eafbe. Error:
{"Error":{"Code":"UserError","Severity":null,"Message":"{"NonCompliant":"ArgumentError(InvalidArgument
{ argument: \"arguments.path\", expected: \"Glob patterns inside
the path are not supported by the volume mount.Path must be a direct
path to the file or folder, or end with
'/[a082bb6b7b039486a52e2427040accec] or
'/[514faf5a71b4f0f67374c388f37aa0d7]/[4a2030f3ff4c5c8696e675e920aac45a]
to match the entire content of the volume. \", actual:
\"REDACTED\" })"}\n{\n "code":
"data-capability.UriMountSession.PyFuseError", \n "target":
"",\n "category": "UserError",\n "error_details": [\n {\n
"key": "NonCompliantReason", \n "value":
"ArgumentError(InvalidArgument { argument: \"arguments.path\",
expected: \"Glob patterns inside the path are not supported by the
volume mount.Path must be a direct path to the file or folder, or end
with '/' or '//*' to match the entire content of the volume.\",
actual: \"REDACTED\" })"\n }, \n {\n "key":
"StackTrace",\n "value": " File
\"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/capability_session.py\",
line 70, in start\n (data_path, sub_data_path) =
session.start()\n\n File
\"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/data_capability/data_sessions.py\",
line 364, in start\n options=mnt_options\n\n File
\"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\",
line 696, in rslex_uri_volume_mount\n raise e\n\n File
\"/opt/miniconda/envs/data-capability/lib/python3.7/site-packages/azureml/dataprep/fuse/dprepfuse.py\",
line 690, in rslex_uri_volume_mount\n mount_context =
RslexDirectURIMountContext(mount_point, uri, options)\n"\n }\n
]\n}",
"MessageFormat":null,"MessageParameters":{},"ReferenceCode":null,"DetailsUri":null,"Target":null,"Details":[],
"InnerError":null,"DebugInfo":null,"AdditionalInfo":null},"Correlation":null,"Environment":null,"Location":null,
"Time":"0001-01-01T00:00:00+00:00","ComponentName":null}
The most important part is
ArgumentError(InvalidArgument { argument: "arguments.path",
expected: "Glob patterns inside the path are not supported by the
volume mount.Path must be a direct path to the file or folder, or end
with '/[a082bb6b7b039486a52e2427040accec] or
'/[514faf5a71b4f0f67374c388f37aa0d7]/[4a2030f3ff4c5c8696e675e920aac45a]
to match the entire content of the volume.
I'm assuming this is happening when its trying to mount my input data to the docker container that will run my pipeline, but not sure. Here is my complete code for the pipeline
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
from azure.ai.ml import MLClient, Input, Output, command
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.constants import AssetTypes
from mldesigner import command_component
import os
import pandas as pd
import numpy as np
import glob
from PIL import Image
import json
import pickle
os.environ['AZURE_TENANT_ID'] = 'xxx-xxx-xxx-xxx'
credential = DefaultAzureCredential()
# Check if given credential can get token successfully.
credential.get_token("https://management.azure.com/.default")
ml_client = MLClient.from_config(credential=credential)
def get_val_test_filenames(input_ml_path, dataset):
df = pd.read_csv(f'{input_ml_path}/evaluation/{dataset}_reference_slides.csv', encoding='utf-8')
slides = df['Slide_id'].to_list()
return slides
def create_id(path):
parts = path.split('/')
file_name = parts[-1][:-4]
hash_str = parts[-2]
doc = parts[-3]
id = f'{doc}__{hash_str}__{file_name}'
return id
def create_y_val(input_ml_path, val_files):
y_val = []
with open(f'{input_ml_path}/evaluation/golden_dev.json') as y_val_file:
val_dict = json.load(y_val_file)
for vf in val_files:
sim_list = val_dict[vf]
y_val.append(sim_list)
return y_val # this should be list of lists
# x_train, x_val, x_test, y_val
def create_no_hier_datasets(input_ml_path, output_ml_path):
print(f'************* inside create no hier datasets *********************')
train_dir = f'{input_ml_path}/raw/images/final_slides/'
val_slides = get_val_test_filenames(input_ml_path, 'val')
test_slides = get_val_test_filenames(input_ml_path, 'test')
x_train, x_val, x_test, y_val = [], [], [], []
cnt = 0
for filename in glob.iglob(train_dir + '**/thumb*.jpg', recursive=True):
if 'small' in filename:
continue
img_np = np.asarray(Image.open(filename))
if img_np.shape != (768, 1024, 3):
print(f'{img_np.shape} does not equal (768, 1024, 3)')
continue
id = create_id(filename)
if id in val_slides:
x_val.append(img_np)
y_val.append(filename)
elif id in test_slides:
x_test.append(img_np)
else:
x_train.append(img_np)
x_train_np = np.asarray(x_train)
x_val_np = np.asarray(x_val)
x_test_np = np.asarray(x_test)
y_val_list = create_y_val(input_ml_path, y_val)
np.save(f"{output_ml_path}/x_train.npy", x_train_np)
np.save(f"{output_ml_path}/x_val.npy", x_val_np)
np.save(f"{output_ml_path}/x_test.npy", x_test_np)
with open(f"{output_ml_path}/y_val.npy", 'wb') as fp:
pickle.dump(y_val_list, fp)
output_version = '1'
input_version = '1'
#command_component(
name="create_train_test_data",
version="1",
display_name="Create train and test data",
description="creates train and test data",
environment=dict(
conda_file="conda.yml",
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
)
)
def create_train_test_data_component(
input_data: Input(type=AssetTypes.URI_FOLDER),
output_data: Output(type=AssetTypes.URI_FOLDER),
):
create_no_hier_datasets(input_data, output_data)
#pipeline(compute='cpu-cluster', description="pipeline to create train and test data")
def data_prep_pipeline(pipeline_input_data):
create_data_node = create_train_test_data_component(input_data=pipeline_input_data)
raw_data_ds = Input(type=AssetTypes.URI_FOLDER, path="azureml:raw_data_v2:1")
output_data_ds = Output(type=AssetTypes.URI_FOLDER, path="azureml:train_test_data:1")
pipeline_job = data_prep_pipeline(pipeline_input_data=raw_data_ds)
pipeline_job = ml_client.jobs.create_or_update(pipeline_job, experiment_name="no_hierarchy")
As the error seems to be happening before my code begins to run, I really don't understand whats going wrong. Does anyone have experience with this?
its by design that uri_folder doesnt support globbing in the path.
Whats your scenario where you need globbing in uri path?

how I can take all klines for symbols in kucoin with python

I want take all symbols for future in kucoin and then I want to get klines for all symbol
I write some code but I cant get all symbol and also I cant get klines
get symbols:
import requests
import pandas as pd
url = " https://api-futures.kucoin.com/api/v1/contracts/active"
payload={}
files={}
headers = {}
margin = requests.request("GET", url, headers=headers, data=payload, files=files)
margin=margin.json()
margin=margin['data']
margin=pd.DataFrame(margin)
pd.set_option('display.max_row', margin.shape[0]+1)
it cant give some symbol in future for exmple btcusdt.
get klines:
import pandas as pd
import requests
url = "https://api-futures.kucoin.com/api/v1/kline/query?symbol=.KXBT&granularity=480&from=1535302400000"
payload={}
files={}
headers ={}
df = requests.request("GET", url, headers=headers, data=payload, files=files)
df=df.json()
df=df['data']
df=pd.DataFrame(df)
df[0] = pd.to_datetime(df[0], unit='ms')
df['date'] = df[0].dt.strftime("%d/%m/%Y")
df['time-utc'] = df[0].dt.strftime("%H:%M:%S")
df
I never see such symbol (KXBT)
also when I want place another symbols like BTCUSDT and another symbols I cant take any data
I read doc but I cant take any thing
from kucoin_futures.client import Market
client = Market(url='https://api-futures.kucoin.com')
all_future_tick = client.get_contracts_list()
for word in all_future_tick:
part_1 = word['baseCurrency']
mid = '-'
part_2 = word['quoteCurrency']
compined = part_1 + mid + part_2
print(compined)
this will give you the name of the coins

Python3 json with japanese characters

I am using aws lambda to return a simple json with some Japanese characters.
I can't seem to get the characters to display correctly.
Here's what my code looks like:
def lambda_handler(event, context):
minutes = datetime.datetime.now().minute
status = ""
if minutes < 10:
status = u"良好"
else:
status = u"不良"
response = {}
response['ID'] = 1
response['Status'] = status
data = json.dumps(response, indent=2, ensure_ascii=False)
data = json.loads(data)
return data
The above code returns:
{"ID": 1, "Status": "\u4e0d\u826f"}
I have also tried this:
data = json.dumps(response, indent=2, ensure_ascii=False).encode('utf-8)
But to no avail.
How can I get the response to return japanese characters?
Edit:
One more thing I noticed. In the browser I get the above json output, however when running a test in AWS console I get the characters displayed properly. What does this mean?
Isn't it your terminal's problem?
I got Japanese characters displayed correctly on my Mac terminal.
import json
minutes = 9
status = ""
if minutes < 10:
status = u"良好"
else:
status = u"不良"
response = {}
response['ID'] = 1
response['Status'] = status
data = json.dumps(response, indent=2, ensure_ascii=False)
data = json.loads(data)
print(data)
{'ID': 1, 'Status': '良好'}
https://ideone.com/XZeVkS
In case you are running flask using the ensure_ascii did not fix the problem, you have to change disable it on the app level this way:
app = Flask(__name__)
app.config['JSON_AS_ASCII'] = False

Download survey results from Qualtrics into Python

I am trying to directly get the data responses from Qualtrics directly into a pandas dataframe python. Is there a way of doing so?
import shutil
import os
import requests
import zipfile
import json
import io
# Setting user Parameters
# apiToken = "myKey"
# surveyId = "mySurveyID"
# fileFormat = "csv"
# dataCenter = "az1"
apiToken = "HfDjOn******"
surveyId = "SV_868******"
fileFormat = "csv"
dataCenter = 'uebs.eu'
# Setting static parameters
requestCheckProgress = 0
progressStatus = "in progress"
baseUrl = "https://{0}.qualtrics.com/API/v3/responseexports/".format(dataCenter)
headers = {
"content-type": "application/json",
"x-api-token": apiToken,
}
Then for # Step 1: Creating Data Export
downloadRequestUrl = baseUrl
then when i try to access the url from my chrom it gives me the following
{"meta":{"httpStatus":"404 - Not Found","error":{"errorMessage":"The requested resource does not exist."}}}
Which I believe the main reason why after running this code
# Step 1: Creating Data Export
downloadRequestUrl = baseUrl
downloadRequestPayload = '{"format":"' + fileFormat + '","surveyId":"' + surveyId + '"}'
downloadRequestResponse = requests.request("POST", downloadRequestUrl, data=downloadRequestPayload, headers=headers)
progressId = downloadRequestResponse.json()["result"]["id"]
print(downloadRequestResponse.text)
It gives me this error
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-38-cd611e49879c> in <module>
3 downloadRequestPayload = '{"format":"' + fileFormat + '","surveyId":"' + surveyId + '"}'
4 downloadRequestResponse = requests.request("POST", downloadRequestUrl, data=downloadRequestPayload, headers=headers)
----> 5 progressId = downloadRequestResponse.json()["result"]["id"]
6 print(downloadRequestResponse.text)
KeyError: 'result
I am somehow new to Qualtrics/python interface may someone share why I am having this difficulty is it because of the dataCenter?
Thank you

sagemaker giving UnicodeDecodeError while deserializing

In sagemaker, was able to load and deploy model from s3. While deserializing the data for prediction, I am getting "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd7 in position 2: invalid continuation byte" on line
"results = predictor.predict(test_X)"
I tried the following sagemaker example https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_applying_machine_learning/linear_time_series_forecast/linear_time_series_forecast.ipynb . I was able to create train, validate and deploy model and store model in s3.
After this I wanted to import model from s3 into sagemaker and test using the imported model. Was able to load and deploy the model, but when predicting for test values, I am getting UnicodeDecodeError
from sagemaker.predictor import csv_serializer, json_deserializer
role = get_execution_role()
sagemaker_session = sagemaker.Session()
model_data = sagemaker.session.s3_input( model_file_location_in_s3, distribution='FullyReplicated', content_type='application/x-sagemaker-model', s3_data_type='S3Prefix')
sagemaker_model = sagemaker.LinearLearnerModel(model_data=model_file,
role=role,
sagemaker_session=sagemaker_session)
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.t2.medium')
#loading test data
gas = pd.read_csv('gasoline.csv', header=None, names=['thousands_barrels'],encoding='utf-8')
gas['thousands_barrels_lag1'] = gas['thousands_barrels'].shift(1)
gas['thousands_barrels_lag2'] = gas['thousands_barrels'].shift(2)
gas['thousands_barrels_lag3'] = gas['thousands_barrels'].shift(3)
gas['thousands_barrels_lag4'] = gas['thousands_barrels'].shift(4)
gas['trend'] = np.arange(len(gas))
gas['log_trend'] = np.log1p(np.arange(len(gas)))
gas['sq_trend'] = np.arange(len(gas)) ** 2
weeks = pd.get_dummies(np.array(list(range(52)) * 15)[:len(gas)], prefix='week')
gas = pd.concat([gas, weeks], axis=1)
gas = gas.iloc[4:, ]
split_train = int(len(gas) * 0.6)
split_test = int(len(gas) * 0.3)
test_y = gas['thousands_barrels'][split_test:]
test_X = gas.drop('thousands_barrels', axis=1).iloc[split_test:, ].as_matrix()
predictor.content_type = 'text/csv'
predictor.serializer = csv_serializer
predictor.deserializer = json_deserializer
results = predictor.predict(test_X)
one_step = np.array([r['score'] for r in results['predictions']])
the program works fine when model is trained and deployed(as in example) but when loading from s3, it throws this error.
The test data is numpy ndarray.
The deserializer does not seem to be appropriated for the content of the response.
To investigate, write a custom deserializer just printing some details:
def debug_deserializer(data, content_type):
print(content_type)
print(data)
and apply it like:
predictor.deserializer = debug_deserializer
This could, for example yield something like this:
application/x-recordio-protobuf
<botocore.response.StreamingBody object at 0x7fd3544883c8>
None
Telling you the content type is application/x-recordio-protobuf. Then write a custom deserializer as for example:
from sagemaker.amazon.common import RecordDeserializer
def recordio_protobuf_deserialize(data, content_type):
rec_des = RecordDeserializer()
return rec_des.deserialize(data, content_type)
and apply like:
predictor.deserializer = recordio_protobuf_deserialize

Resources