Invoking sagemaker endpoint using AWS-Lamda throwing parameter validation error - python-3.x

I trained an ML model in AWS Sagemaker and created an endpoint. I want to invoke it using AWS-Lambda. My model has 30 predictor variables. So I passed them into test event of Lambda as dict type as mentioned below
{
"Time": "10 ",
"V1": "1.449043781 ",
"V2": "-1.176338825 ",
"V3": "0.913859833 ",
"V4": "-1.375666655 ",
"V5": "-1.971383165 ",
"V6": "-0.629152139 ",
"V7": "-1.423235601 ",
"V8": "0.048455888 ",
"V9": "-1.720408393 ",
"V10": "1.626659058 ",
"V11": "1.19964395 ",
"V12": "-0.671439778 ",
"V13": "-0.513947153 ",
"V14": "-0.095045045 ",
"V15": "0.230930409 ",
"V16": "0.031967467 ",
"V17": "0.253414716 ",
"V18": "0.854343814 ",
"V19": "-0.221365414 ",
"V20": "-0.387226474 ",
"V21": "-0.009301897 ",
"V22": "0.313894411 ",
"V23": "0.027740158 ",
"V24": "0.500512287 ",
"V25": "0.251367359 ",
"V26": "-0.129477954 ",
"V27": "0.042849871 ",
"V28": "0.016253262 ",
"Amount": "7.8"
}
Now I ran below mentioned code in AWS Lambda
import json
import os
import csv
import boto3
import io
import codecs
endpoint_name = os.environ['ENDPOINT_NAME']
runtime = boto3.client('runtime.sagemaker')
def lambda_handler(event, context):
print("received event: "+json.dumps(event,indent=2))
data = json.loads(json.dumps(event))
payload = data["Time"]+data["V1"]+data["V2"]+data["V3"]+data["V4"]+data["V5"]+data["V6"]+data["V7"]+data["V8"]+data["V9"]+data["V10"]+data["V11"]+data["V12"]+data["V13"]+data["V14"]+data["V15"]+data["V16"]+data["V17"]+data["V18"]+data["V19"]+data["V20"]+data["V21"]+data["V22"]+data["V23"]+data["V24"]+data["V25"]+data["V26"]+data["V27"]+data["V28"]+data["Amount"]
payload = payload.split(" ")
payload = [codecs.encode(i,'utf-8') for i in payload]
payload=[bytearray(i) for i in payload]
print(payload)
response = runtime.invoke_endpoint(EndpointName=endpoint_name,ContentType='text/csv',Body=payload)
print(response)
result=json.loads(response['Body'].decode())
pred = int(float(response))
predicted_label = 'fraud' if pred==1 else 'not fraud'
return predicted_label
This code is throwing below this error
[ERROR] ParamValidationError: Parameter validation failed:
Invalid type for parameter Body, value: [bytearray(b'10'), bytearray(b'1.449043781'), bytearray(b'-1.176338825'), bytearray(b'0.913859833'), bytearray(b'-1.375666655'), bytearray(b'-1.971383165'), bytearray(b'-0.629152139'), bytearray(b'-1.423235601'), bytearray(b'0.048455888'), bytearray(b'-1.720408393'), bytearray(b'1.626659058'), bytearray(b'1.19964395'), bytearray(b'-0.671439778'), bytearray(b'-0.513947153'), bytearray(b'-0.095045045'), bytearray(b'0.230930409'), bytearray(b'0.031967467'), bytearray(b'0.253414716'), bytearray(b'0.854343814'), bytearray(b'-0.221365414'), bytearray(b'-0.387226474'), bytearray(b'-0.009301897'), bytearray(b'0.313894411'), bytearray(b'0.027740158'), bytearray(b'0.500512287'), bytearray(b'0.251367359'), bytearray(b'-0.129477954'), bytearray(b'0.042849871'), bytearray(b'0.016253262'), bytearray(b'7.8')], type: <class 'list'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object
I understand that somehow I need to pass my 30 features into Lambda function such that data type of payload is compatible with ContentType for respnse to work. Can someone please explain how to do it?
edit: I'm trying this problem by looking at this aws blog. I don't quite understand how the author of above mentioned blog did it.

Pretty sure your error is coming from:
response = runtime.invoke_endpoint(EndpointName=endpoint_name,ContentType='text/csv',Body=payload)
but just FYI, it's very helpful if when you ask the a question about an exception, you point out the line causing it :)
The invoke_endpoint() docs specify that Body has to be:
bytes or seekable file-like object
and that Body
provides input data, in the format specified in the ContentType request header.
Your content type is text/csv so the endpoint will be expecting bytes which represent a csv string. The question then is what is it getting instead that's making it unhappy?
payload = data["Time"]+data["V1"]+ ...
At this point, payload is a string
payload = payload.split(" ")
Now it's a list of strings
payload = [codecs.encode(i,'utf-8') for i in payload]
Now it's a list of strings encoded in utf-8 (this step might be unnecessary)
payload=[bytearray(i) for i in payload]
Now it's a list of bytearrays.
And that list is what you're passing as Body.
So, to get it to work, you'll need to change your logic to:
Make your original input into a CSV
Convert that CSV into one single byte array
Pass that array as part of your invocation
A note on style: passing the input numbers as strings, with trailing spaces so you can split(" ") is pretty hacky. Just pass the input as plain numbers (remove the " surrounding them), and find a way to do what you need with appending then splitting (these are effectively inverse operations, they cancel each other out).

Related

Azure python sdk service bus receive message

I am a bit confused about the azure python servicebus.
I have a servicebus TOPIC and SUBSCRIPTION which listen to specific messages, I have the code to receive those messages which then they will be processed by aws comprehend.
Following Microsoft documentation, the basic code to receive the message work and I am able to print it, but when I integrate the same logic with comprehend it fails.
Here is the example, this is the bit of code from Microsoft documentation:
with servicebus_client:
# get the Queue Receiver object for the queue
receiver = servicebus_client.get_queue_receiver(queue_name=QUEUE_NAME, max_wait_time=5)
with receiver:
for msg in receiver:
print("Received: " + str(msg))
# complete the message so that the message is removed from the queue
receiver.complete_message(msg)
and the output is this
{"ModuleId":"123458", "Text":"This is amazing."}
Receive is done.
My first thought was that the message received, was a Json object. so I started writing the code to read data from a json outputs as follow:
servicebus_client = ServiceBusClient.from_connection_string(conn_str=CONNECTION_STR)
with servicebus_client:
receiver = servicebus_client.get_subscription_receiver(
topic_name=TOPIC_NAME,
subscription_name=SUBSCRIPTION_NAME
)
with receiver:
received_msgs = receiver.receive_messages(max_message_count=10, max_wait_time=5)
for msg in received_msgs:
# print(str(msg))
message = json.dumps(msg)
text = message['Text']
#passing the text to comprehend
result_json= json.dumps(comprehend.detect_sentiment(Text=text, LanguageCode='en'), sort_keys=True, indent=4)
result = json.loads(result_json) # converting json to python dictionary
#extracting the sentiment value
sentiment = result["Sentiment"]
#extracting the sentiment score
if sentiment == "POSITIVE":
value = round(result["SentimentScore"]["Positive"] * 100,2)
elif sentiment == "NEGATIVE":
value = round(result["SentimentScore"]["Negative"] * 100,2)
elif sentiment == "NEUTRAL":
value = round(result["SentimentScore"]["Neutral"] * 100,2)
elif sentiment == "MIXED":
value = round(result["SentimentScore"]["Mixed"] * 100,2)
#store the text, sentiment and value in a dictionary and convert it tp JSON
output={'Text':text,'Sentiment':sentiment, 'Value':value}
output_json = json.dumps(output)
print('Text: ',text,'\nSentiment: ',sentiment,'\nValue: ', value)
print('In JSON format\n',output_json)
receiver.complete_message(msg)
print("Receive is done.")
But when I run this I get the following error:
TypeError: Object of type ServiceBusReceivedMessage is not JSON serializable
Did this ever happened to anybody who can help me to understand what is the type of servicebus that is coming back from the receive?
Thank you so much everyone
Did this ever happened to anybody who can help me to understand what
is the type of servicebus that is coming back from the receive?
The type of the received message is ServiceBusReceivedMessage which is derived from ServiceBusMessage. The contents of the message can be fetched from its body property.
Can you please try something like:
message = json.dumps(msg.body)

Is there a way to Keep track of all the bad records that are allowed while loading a ndjson file into Bigquery

I have a requirement where I need to keep track of all the bad records that were not feeded into bigquery after allowing max_bad_records. So I need them written in a File on storage for Future reference. I'm using BQ API for Python, Is there a way we can achieve this? I think if we are allowing max_bad_records we dont have the details of failed loads in BQ Load Job.
Thanks
Currently, there isn't a direct way of accessing and saving the bad records. However, you can access some job statistics including the reason why the record was skipped within BigQuery _job_statistics().
I have created an example, in order to demonstrate how the statistics will be shown. I have the following sample .csv file in a GCS bucket:
name,age
robert,25
felix,23
john,john
As you can see, the last row is a bad record, because I will import age as INT64 and there is a string in that row. In addition, I used the following code to upload it to BigQuery:
from google.cloud import bigquery
client = bigquery.Client()
table_ref = client.dataset('dataset').table('table_name')
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("age", "INT64"),
]
)
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
job_config.skip_leading_rows = 1
job_config.max_bad_records = 5
#job_config.autodetect = True
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = "gs://path/file.csv"
load_job = client.load_table_from_uri(
uri, table_ref, job_config=job_config
) # API request
print("Starting job {}".format(load_job.job_id))
load_job.result() # Waits for table load to complete.
print("Job finished.")
destination_table = client.get_table(table_ref)
print("Loaded {} rows.".format(destination_table.num_rows))
#Below all the statistics that might be useful in your case
job_state = load_job.state
job_id = load_job.job_id
error_result = load_job.error_result
job_statistics = load_job._job_statistics()
badRecords = job_statistics['badRecords']
outputRows = job_statistics['outputRows']
inputFiles = job_statistics['inputFiles']
inputFileBytes = job_statistics['inputFileBytes']
outputBytes = job_statistics['outputBytes']
print("***************************** ")
print(" job_state: " + str(job_state))
print(" non fatal error: " + str(load_job.errors))
print(" error_result: " + str(error_result))
print(" job_id: " + str(job_id))
print(" badRecords: " + str(badRecords))
print(" outputRows: " + str(outputRows))
print(" inputFiles: " + str(inputFiles))
print(" inputFileBytes: " + str(inputFileBytes))
print(" outputBytes: " + str(outputBytes))
print(" ***************************** ")
print("------ load_job.errors ")
The output from the statistics :
*****************************
job_state: DONE
non fatal errors: [{u'reason': u'invalid', u'message': u"Error while reading data, error message: Could not parse 'john' as INT64 for field age (position 1) starting at location 23", u'location': u'gs://path/file.csv'}]
error_result: None
job_id: b2b63e39-a5fb-47df-b12b-41a835f5cf5a
badRecords: 1
outputRows: 2
inputFiles: 1
inputFileBytes: 33
outputBytes: 26
*****************************
As it is shown above, the erros field returns the non fatal errors, which includes the bad records. In other words, it retrieves individual errors generated by the job. Whereas, the error_result returns the error information as the job as a whole.
I believe these statistics might help you analyse your bad records. Lastly, you can output them into a file, using write(), such as:
with open("errors.txt", "x") as f:
f.write(load_job.errors)
f.close()

Doing feature generation in serving_input_fn for Tensorflow model

I've been playing around with BERT and TensorFlow following the example here and have a trained working model.
I then wanted to save and deploy the model, so used the export_saved_model function, which requires you build a serving_input_fn to handle any incoming requests when the model is reloaded.
I wanted to be able to pass a single string for sentiment analysis to the deployed model, rather than having a theoretical client side application do the tokenisation and feature generation etc, so tried to write an input function that would handle that and pass the constructed features to the model. Is this possible? I wrote the following which I feel should do what I want:
import json
import base64
def plain_text_serving_input_fn():
input_string = tf.placeholder(dtype=tf.string, shape=None, name='input_string_text')
# What format to expect input in.
receiver_tensors = {'input_text': input_string}
input_examples = [run_classifier.InputExample(guid="", text_a = str(input_string), text_b = None, label = 0)] # here, "" is just a dummy label
input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
variables = {}
for i in input_features:
variables["input_ids"] = i.input_ids
variables["input_mask"] = i.input_mask
variables["segment_ids"] = i.segment_ids
variables["label_id"] = i.label_id
feature_spec = {
"input_ids" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"input_mask" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"segment_ids" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"label_ids" : tf.FixedLenFeature([], tf.int64)
}
string_variables = json.dumps(variables)
encode_input = base64.b64encode(string_variables.encode('utf-8'))
encode_string = base64.decodestring(encode_input)
features_to_input = tf.parse_example([encode_string], feature_spec)
return tf.estimator.export.ServingInputReceiver(features_to_input, receiver_tensors)
I would expect that this would allow me to call predict on my deployed model with
variables = {"input_text" : "This is some test input"}
predictor.predict(variables)
I've tried a range of variations of this (putting it in an array, converting to base 64 etc), but I get a range of errors either telling me
"error": "Failed to process element: 0 of 'instances' list. Error: Invalid argument: JSON Value: {\n \"input_text\": \"This is some test input\"\n} not formatted correctly for base64 data" }"
or
Object of type 'bytes' is not JSON serializable
I suspect I'm formatting my requests incorrectly, but I also can't find any examples of something similar being done in a serving_input_fn, so has anyone ever done something similar?

Need to pass multiple variables to payload variables

I am trying to pass multiple variables to payload using format & +str(Var)+ but I am not getting the expected output. I have the hostnames in a file & get a password as input and want to pass it to the payload.
I am getting an error related to "Error while parsing JSON payload or an incompatible argument type for the requested resource"
for x in content:
url='https://url/a/b/c/{}'.format(x.strip())
payload=('{{"ip-address": "x.x.x.x","user-name": "john","password": "'+ str(Pass) +'","db-name": "'+ str(x.strip()) +'","service-name": "y","port": "y","connection-string": "y"}}')
response = req.post(url,json=payload,headers=add_cookie,verify=False)
======================
for x in content:
url='https://url/a/b/c/{}'.format(x.strip())
payload={"ip-address": "x.x.x.x","user-name": "john","password": "{}","db-name": "{}","service-name": "y","port": "y","connection-string": "y"}.format(Pass, x.strip())
response = req.post(url,json=payload,headers=add_cookie,verify=False)
In first part your payload is a string and not a dict, it should be
payload={"ip-address": "x.x.x.x","user-name": "john","password": str(Pass),"db-name": str(x.strip()),"service-name": "y","port": "y","connection-string": "y"}
In the second one you're using the format function on a dict type which is wrong.

AvroTypeException: When writing in python3

My avsc file is as follows:
{"type":"record",
"namespace":"testing.avro",
"name":"product",
"aliases":["items","services","plans","deliverables"],
"fields":
[
{"name":"id", "type":"string" ,"aliases":["productid","itemid","item","product"]},
{"name":"brand", "type":"string","doc":"The brand associated", "default":"-1"},
{"name":"category","type":{"type":"map","values":"string"},"doc":"the list of categoryId, categoryName associated, send Id as key, name as value" },
{"name":"keywords", "type":{"type":"array","items":"string"},"doc":"this helps in long run in long run analysis, send the search keywords used for product"},
{"name":"groupid", "type":["string","null"],"doc":"Use this to represent or flag value of group to which it belong, e.g. it may be variation of same product"},
{"name":"price", "type":"double","aliases":["cost","unitprice"]},
{"name":"unit", "type":"string", "default":"Each"},
{"name":"unittype", "type":"string","aliases":["UOM"], "default":"Each"},
{"name":"url", "type":["string","null"],"doc":"URL of the product to return for more details on product, this will be used for event analysis. Provide full url"},
{"name":"imageurl","type":["string","null"],"doc":"Image url to display for return values"},
{"name":"updatedtime", "type":"string"},
{"name":"currency","type":"string", "default":"INR"},
{"name":"image", "type":["bytes","null"] , "doc":"fallback in case we cant provide the image url, use this judiciously and limit size"},
{"name":"features","type":{"type":"map","values":"string"},"doc":"Pass your classification attributes as features in key-value pair"}
]}
I am able to parse this but when I try to write on this as follows, I keep getting issue. What am I missing ? This is in python3. I verified it is well formated json, too.
from avro import schema as sc
from avro import datafile as df
from avro import io as avio
import os
_prodschema = 'product.avsc'
_namespace = 'testing.avro'
dirname = os.path.dirname(__file__)
avroschemaname = os.path.join( os.path.dirname(__file__),_prodschema)
sch = {}
with open(avroschemaname,'r') as f:
sch= f.read().encode(encoding='utf-8')
f.close()
proschema = sc.Parse(sch)
print("Schema processed")
writer = df.DataFileWriter(open(os.path.join(dirname,"products.json"),'wb'),
avio.DatumWriter(),proschema)
print("Just about to append the json")
writer.append({ "id":"23232",
"brand":"Relaxo",
"category":[{"123":"shoe","122":"accessories"}],
"keywords":["relaxo","shoe"],
"groupid":"",
"price":"799.99",
"unit":"Each",
"unittype":"Each",
"url":"",
"imageurl":"",
"updatedtime": "03/23/2017",
"currency":"INR",
"image":"",
"features":[{"color":"black","size":"10","style":"contemperory"}]
})
writer.close()
What am I missing here ?

Resources