I've been playing around with BERT and TensorFlow following the example here and have a trained working model.
I then wanted to save and deploy the model, so used the export_saved_model function, which requires you build a serving_input_fn to handle any incoming requests when the model is reloaded.
I wanted to be able to pass a single string for sentiment analysis to the deployed model, rather than having a theoretical client side application do the tokenisation and feature generation etc, so tried to write an input function that would handle that and pass the constructed features to the model. Is this possible? I wrote the following which I feel should do what I want:
import json
import base64
def plain_text_serving_input_fn():
input_string = tf.placeholder(dtype=tf.string, shape=None, name='input_string_text')
# What format to expect input in.
receiver_tensors = {'input_text': input_string}
input_examples = [run_classifier.InputExample(guid="", text_a = str(input_string), text_b = None, label = 0)] # here, "" is just a dummy label
input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
variables = {}
for i in input_features:
variables["input_ids"] = i.input_ids
variables["input_mask"] = i.input_mask
variables["segment_ids"] = i.segment_ids
variables["label_id"] = i.label_id
feature_spec = {
"input_ids" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"input_mask" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"segment_ids" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"label_ids" : tf.FixedLenFeature([], tf.int64)
}
string_variables = json.dumps(variables)
encode_input = base64.b64encode(string_variables.encode('utf-8'))
encode_string = base64.decodestring(encode_input)
features_to_input = tf.parse_example([encode_string], feature_spec)
return tf.estimator.export.ServingInputReceiver(features_to_input, receiver_tensors)
I would expect that this would allow me to call predict on my deployed model with
variables = {"input_text" : "This is some test input"}
predictor.predict(variables)
I've tried a range of variations of this (putting it in an array, converting to base 64 etc), but I get a range of errors either telling me
"error": "Failed to process element: 0 of 'instances' list. Error: Invalid argument: JSON Value: {\n \"input_text\": \"This is some test input\"\n} not formatted correctly for base64 data" }"
or
Object of type 'bytes' is not JSON serializable
I suspect I'm formatting my requests incorrectly, but I also can't find any examples of something similar being done in a serving_input_fn, so has anyone ever done something similar?
Related
I want to do inference using openvino.
But I got an error while using openvino.
Any way to solve it?enter code here
model = keras.models.load_model('/resnet50.h5')
onnx_model, _ = tf2onnx.convert.from_keras(model, opset=16)
onnx.save(onnx_model, '/t1_model.onnx')
ie = IECore()
net = ie.read_network("/t1_model.onnx")
input_name = list(net.input_info.keys())[0]
output_name = list(net.outputs.keys())[0]
net.input_info[input_name].precision = 'FP32'
net.outputs[output_name].precision = 'FP32'
exec_net = ie.load_network(network=net, device_name='CPU')
I faced these problems.
RuntimeError: Check 'std::get<0>(valid)' failed at C:\j\workspace\private-ci\ie\build-windows-vs2019#3\b\repos\openvino\src\inference\src\ie_core.cpp:1414:
InferenceEngine::Core::LoadNetwork doesn't support inputs having dynamic shapes. Use ov::Core::compile_model API instead. Dynamic inputs are :{ input:'input_1,input_1', shape={?,256,256,3}}
input_shape = (None, 256,256,3)
The IECore API doesn't support dynamic shapes so you need to make your model static before you load it into the plugin. You can use the reshape() method on the imported model.
As an alternative you can switch to the 2022.1 version of OV where the dynamic shapes are supported. You have to switch from IECore to Core, read_network -> read_model, load_network -> compile_model.
I'm calling a simple python function in google cloud but cannot get it to save. It shows this error:
"Function failed on loading user code. This is likely due to a bug in the user code. Error message: Error: please examine your function logs to see the error cause: https://cloud.google.com/functions/docs/monitoring/logging#viewing_logs. Additional troubleshooting documentation can be found at https://cloud.google.com/functions/docs/troubleshooting#logging. Please visit https://cloud.google.com/functions/docs/troubleshooting for in-depth troubleshooting documentation."
Logs don't seem to show much that would indicate error in the code. I followed this guide: https://blog.thereportapi.com/automate-a-daily-etl-of-currency-rates-into-bigquery/
With the only difference environment variables and the endpoint I'm using.
Code is below, which is just a get request followed by a push of data into a table.
import requests
import json
import time;
import os;
from google.cloud import bigquery
# Set any default values for these variables if they are not found from Environment variables
PROJECT_ID = os.environ.get("PROJECT_ID", "xxxxxxxxxxxxxx")
EXCHANGERATESAPI_KEY = os.environ.get("EXCHANGERATESAPI_KEY", "xxxxxxxxxxxxxxx")
REGIONAL_ENDPOINT = os.environ.get("REGIONAL_ENDPOINT", "europe-west1")
DATASET_ID = os.environ.get("DATASET_ID", "currency_rates")
TABLE_NAME = os.environ.get("TABLE_NAME", "currency_rates")
BASE_CURRENCY = os.environ.get("BASE_CURRENCY", "SEK")
SYMBOLS = os.environ.get("SYMBOLS", "NOK,EUR,USD,GBP")
def hello_world(request):
latest_response = get_latest_currency_rates();
write_to_bq(latest_response)
return "Success"
def get_latest_currency_rates():
PARAMS={'access_key': EXCHANGERATESAPI_KEY , 'symbols': SYMBOLS, 'base': BASE_CURRENCY}
response = requests.get("https://api.exchangeratesapi.io/v1/latest", params=PARAMS)
print(response.json())
return response.json()
def write_to_bq(response):
# Instantiates a client
bigquery_client = bigquery.Client(project=PROJECT_ID)
# Prepares a reference to the dataset
dataset_ref = bigquery_client.dataset(DATASET_ID)
table_ref = dataset_ref.table(TABLE_NAME)
table = bigquery_client.get_table(table_ref)
# get the current timestamp so we know how fresh the data is
timestamp = time.time()
jsondump = json.dumps(response) #Returns a string
# Ensure the Response is a String not JSON
rows_to_insert = [{"timestamp":timestamp,"data":jsondump}]
errors = bigquery_client.insert_rows(table, rows_to_insert) # API request
print(errors)
assert errors == []
I tried just the part that does the get request with an offline editor and I can confirm a response works fine. I suspect it might have to do something with permissions or the way the script tries to access the database.
The ParallelRunStep Documentation suggests the following:
A named input Dataset (DatasetConsumptionConfig class)
path_on_datastore = iris_data.path('iris/')
input_iris_ds = Dataset.Tabular.from_delimited_files(path=path_on_datastore, validate=False)
named_iris_ds = input_iris_ds.as_named_input(iris_ds_name)
Which is just passed as an Input:
distributed_csv_iris_step = ParallelRunStep(
name='example-iris',
inputs=[named_iris_ds],
output=output_folder,
parallel_run_config=parallel_run_config,
arguments=['--model_name', 'iris-prs'],
allow_reuse=False
)
The Documentation to submit Dataset Inputs as Parameters suggests the following:
The Input is a DatasetConsumptionConfig class element
tabular_dataset = Dataset.Tabular.from_delimited_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
tabular_pipeline_param = PipelineParameter(name="tabular_ds_param", default_value=tabular_dataset)
tabular_ds_consumption = DatasetConsumptionConfig("tabular_dataset", tabular_pipeline_param)
Which is passed in arguments as well in inputs
train_step = PythonScriptStep(
name="train_step",
script_name="train_with_dataset.py",
arguments=["--param2", tabular_ds_consumption],
inputs=[tabular_ds_consumption],
compute_target=compute_target,
source_directory=source_directory)
While submitting with new parameter we create a new Dataset class:
iris_tabular_ds = Dataset.Tabular.from_delimited_files('some_link')
And submit it like this:
pipeline_run_with_params = experiment.submit(pipeline, pipeline_parameters={'tabular_ds_param': iris_tabular_ds})
However, how do we combine this: How do we pass a Dataset Input as a Parameter to the ParallelRunStep?
If we create a DatasetConsumptionConfig class element like so:
tabular_dataset = Dataset.Tabular.from_delimited_files('https://dprepdata.blob.core.windows.net/demo/Titanic.csv')
tabular_pipeline_param = PipelineParameter(name="tabular_ds_param", default_value=tabular_dataset)
tabular_ds_consumption = DatasetConsumptionConfig("tabular_dataset", tabular_pipeline_param)
And pass it as an argument in the ParallelRunStep, it will throw an error.
References:
Notebook with Dataset Input Parameter
ParallelRunStep Notebook
AML ParallelRunStep GA is a managed solution to scale up and out large ML workload, including batch inference, training and large data processing. Please check out below documents for the details.
• Overview doc: run batch inference using ParallelRunStep
• Sample notebooks
• AI Show: How to do Batch Inference using AML ParallelRunStep
• Blog: Batch Inference in Azure Machine Learning
For the inputs we create Dataset class instances:
tabular_ds1 = Dataset.Tabular.from_delimited_files('some_link')
tabular_ds2 = Dataset.Tabular.from_delimited_files('some_link')
ParallelRunStep produces an output file, we use the PipelineData class to create a folder which will store this output:
from azureml.pipeline.core import Pipeline, PipelineData
output_dir = PipelineData(name="inferences", datastore=def_data_store)
The ParallelRunStep depends on ParallelRunConfig Class to include details about the environment, entry script, output file name and other necessary definitions:
from azureml.pipeline.core import PipelineParameter
from azureml.pipeline.steps import ParallelRunStep, ParallelRunConfig
parallel_run_config = ParallelRunConfig(
source_directory=scripts_folder,
entry_script=script_file,
mini_batch_size=PipelineParameter(name="batch_size_param", default_value="5"),
error_threshold=10,
output_action="append_row",
append_row_file_name="mnist_outputs.txt",
environment=batch_env,
compute_target=compute_target,
process_count_per_node=PipelineParameter(name="process_count_param", default_value=2),
node_count=2
)
The input to ParallelRunStep is created using the following code
tabular_pipeline_param = PipelineParameter(name="tabular_ds_param", default_value=tabular_ds1)
tabular_ds_consumption = DatasetConsumptionConfig("tabular_dataset", tabular_pipeline_param)
The PipelineParameter helps us run the pipeline for different datasets.
ParallelRunStep consumes this as an input:
parallelrun_step = ParallelRunStep(
name="some-name",
parallel_run_config=parallel_run_config,
inputs=[ tabular_ds_consumption ],
output=output_dir,
allow_reuse=False
)
To consume with another dataset:
pipeline_run_2 = experiment.submit(pipeline,
pipeline_parameters={"tabular_ds_param": tabular_ds2}
)
There is an error currently: DatasetConsumptionConfig and PipelineParameter cannot be reused
I'm using python 3 to write a script that generates a customer report for Solarwinds N-Central. The script uses SOAP to query N-Central and I'm using zeep for this project. While not new to python I am new to SOAP.
When calling the CustomerList fuction I'm getting the TypeError: __init__() got an unexpected keyword argument 'listSOs'
import zeep
wsdl = 'http://' + <server url> + '/dms/services/ServerEI?wsdl'
client = zeep.CachingClient(wsdl=wsdl)
config = {'listSOs': 'true'}
customers = client.service.CustomerList(Username=nc_user, Password=nc_pass, Settings=config)
Per the perameters below 'listSOs' is not only a valid keyword, its the only one accepted.
CustomerList
public com.nable.nobj.ei.Customer[] CustomerList(String username, String password, com.nable.nobj.ei.T_KeyPair[] settings) throws RemoteException
Parameters:
username - MSP N-central username
password - Corresponding MSP N-central password
settings - A list of non default settings stored in a T_KeyPair[]. Below is a list of the acceptable Keys and Values. If not used leave null
(Key) listSOs - (Value) "true" or "false". If true only SOs with be shown, if false only customers and sites will be shown. Default value is false.
I've also tried passing the dictionary as part of a list:
config = []
key = {'listSOs': 'true'}
config += key
TypeError: Any element received object of type 'str', expected lxml.etree._Element or builtins.dict or zeep.objects.T_KeyPair
Omitting the Settings value entirely:
customers = client.service.CustomerList(Username=nc_user, Password=nc_pass)
zeep.exceptions.ValidationError: Missing element Settings (CustomerList.Settings)
And trying zeep's SkipValue:
customers = client.service.CustomerList(Username=nc_user, Password=nc_pass, Settings=zeep.xsd.SkipValue)
zeep.exceptions.Fault: java.lang.NullPointerException
I'm probably missing something simple but I've been banging my head against the wall off and on this for awhile I'm hoping someone can point me in the right direction.
Here's my source code from my getAssets.py script. I did it in Python2.7, easily upgradeable though. Hope it helps someone else, N-central's API documentation is really bad lol.
#pip2.7 install zeep
import zeep, sys, csv, copy
from zeep import helpers
api_username = 'your_ncentral_api_user'
api_password='your_ncentral_api_user_pw'
wsdl = 'https://(yourdomain|tenant)/dms2/services2/ServerEI2?wsdl'
client = zeep.CachingClient(wsdl=wsdl)
response = client.service.deviceList(
username=api_username,
password=api_password,
settings=
{
'key': 'customerId',
'value': 1
}
)
# If you can't tell yet, I code sloppy
devices_list = []
device_dict = {}
dev_inc = 0
max_dict_keys = 0
final_keys = []
for device in response:
# Iterate through all device nodes
for device_properties in device.items:
# Iterate through each device's properties and add it to a dict (keyed array)
device_dict[device_properties.first]=device_properties.second
# Dig further into device properties
device_properties = client.service.devicePropertyList(
username=api_username,
password=api_password,
deviceIDs=device_dict['device.deviceid'],
reverseOrder=False
)
prop_ind = 0 # This is a hacky thing I did to make my CSV writing work
for device_node in device_properties:
for prop_tree in device_node.properties:
for key, value in helpers.serialize_object(prop_tree).items():
prop_ind+=1
device_dict["prop" + str(prop_ind) + "_" + str(key)]=str(value)
# Append the dict to a list (array), giving us a multi dimensional array, you need to do deep copy, as .copy will act like a pointer
devices_list.append(copy.deepcopy(device_dict))
# check to see the amount of keys in the last item
if len(devices_list[-1].keys()) > max_dict_keys:
max_dict_keys = len(devices_list[-1].keys())
final_keys = devices_list[-1].keys()
print "Gathered all the datas of N-central devices count: ",len(devices_list)
# Write the data out to a CSV
with open('output.csv', 'w') as csvfile:
fieldnames = final_keys
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for csv_line in devices_list:
writer.writerow(csv_line)
My avsc file is as follows:
{"type":"record",
"namespace":"testing.avro",
"name":"product",
"aliases":["items","services","plans","deliverables"],
"fields":
[
{"name":"id", "type":"string" ,"aliases":["productid","itemid","item","product"]},
{"name":"brand", "type":"string","doc":"The brand associated", "default":"-1"},
{"name":"category","type":{"type":"map","values":"string"},"doc":"the list of categoryId, categoryName associated, send Id as key, name as value" },
{"name":"keywords", "type":{"type":"array","items":"string"},"doc":"this helps in long run in long run analysis, send the search keywords used for product"},
{"name":"groupid", "type":["string","null"],"doc":"Use this to represent or flag value of group to which it belong, e.g. it may be variation of same product"},
{"name":"price", "type":"double","aliases":["cost","unitprice"]},
{"name":"unit", "type":"string", "default":"Each"},
{"name":"unittype", "type":"string","aliases":["UOM"], "default":"Each"},
{"name":"url", "type":["string","null"],"doc":"URL of the product to return for more details on product, this will be used for event analysis. Provide full url"},
{"name":"imageurl","type":["string","null"],"doc":"Image url to display for return values"},
{"name":"updatedtime", "type":"string"},
{"name":"currency","type":"string", "default":"INR"},
{"name":"image", "type":["bytes","null"] , "doc":"fallback in case we cant provide the image url, use this judiciously and limit size"},
{"name":"features","type":{"type":"map","values":"string"},"doc":"Pass your classification attributes as features in key-value pair"}
]}
I am able to parse this but when I try to write on this as follows, I keep getting issue. What am I missing ? This is in python3. I verified it is well formated json, too.
from avro import schema as sc
from avro import datafile as df
from avro import io as avio
import os
_prodschema = 'product.avsc'
_namespace = 'testing.avro'
dirname = os.path.dirname(__file__)
avroschemaname = os.path.join( os.path.dirname(__file__),_prodschema)
sch = {}
with open(avroschemaname,'r') as f:
sch= f.read().encode(encoding='utf-8')
f.close()
proschema = sc.Parse(sch)
print("Schema processed")
writer = df.DataFileWriter(open(os.path.join(dirname,"products.json"),'wb'),
avio.DatumWriter(),proschema)
print("Just about to append the json")
writer.append({ "id":"23232",
"brand":"Relaxo",
"category":[{"123":"shoe","122":"accessories"}],
"keywords":["relaxo","shoe"],
"groupid":"",
"price":"799.99",
"unit":"Each",
"unittype":"Each",
"url":"",
"imageurl":"",
"updatedtime": "03/23/2017",
"currency":"INR",
"image":"",
"features":[{"color":"black","size":"10","style":"contemperory"}]
})
writer.close()
What am I missing here ?