AvroTypeException: When writing in python3 - python-3.x

My avsc file is as follows:
{"type":"record",
"namespace":"testing.avro",
"name":"product",
"aliases":["items","services","plans","deliverables"],
"fields":
[
{"name":"id", "type":"string" ,"aliases":["productid","itemid","item","product"]},
{"name":"brand", "type":"string","doc":"The brand associated", "default":"-1"},
{"name":"category","type":{"type":"map","values":"string"},"doc":"the list of categoryId, categoryName associated, send Id as key, name as value" },
{"name":"keywords", "type":{"type":"array","items":"string"},"doc":"this helps in long run in long run analysis, send the search keywords used for product"},
{"name":"groupid", "type":["string","null"],"doc":"Use this to represent or flag value of group to which it belong, e.g. it may be variation of same product"},
{"name":"price", "type":"double","aliases":["cost","unitprice"]},
{"name":"unit", "type":"string", "default":"Each"},
{"name":"unittype", "type":"string","aliases":["UOM"], "default":"Each"},
{"name":"url", "type":["string","null"],"doc":"URL of the product to return for more details on product, this will be used for event analysis. Provide full url"},
{"name":"imageurl","type":["string","null"],"doc":"Image url to display for return values"},
{"name":"updatedtime", "type":"string"},
{"name":"currency","type":"string", "default":"INR"},
{"name":"image", "type":["bytes","null"] , "doc":"fallback in case we cant provide the image url, use this judiciously and limit size"},
{"name":"features","type":{"type":"map","values":"string"},"doc":"Pass your classification attributes as features in key-value pair"}
]}
I am able to parse this but when I try to write on this as follows, I keep getting issue. What am I missing ? This is in python3. I verified it is well formated json, too.
from avro import schema as sc
from avro import datafile as df
from avro import io as avio
import os
_prodschema = 'product.avsc'
_namespace = 'testing.avro'
dirname = os.path.dirname(__file__)
avroschemaname = os.path.join( os.path.dirname(__file__),_prodschema)
sch = {}
with open(avroschemaname,'r') as f:
sch= f.read().encode(encoding='utf-8')
f.close()
proschema = sc.Parse(sch)
print("Schema processed")
writer = df.DataFileWriter(open(os.path.join(dirname,"products.json"),'wb'),
avio.DatumWriter(),proschema)
print("Just about to append the json")
writer.append({ "id":"23232",
"brand":"Relaxo",
"category":[{"123":"shoe","122":"accessories"}],
"keywords":["relaxo","shoe"],
"groupid":"",
"price":"799.99",
"unit":"Each",
"unittype":"Each",
"url":"",
"imageurl":"",
"updatedtime": "03/23/2017",
"currency":"INR",
"image":"",
"features":[{"color":"black","size":"10","style":"contemperory"}]
})
writer.close()
What am I missing here ?

Related

Pagination not working in Python Session.put()

I am trying to upload a file to a website (that has an inbuilt API) using the following code. The code reads a list of medical codes/diagnoses codes etc. (1 column in a text file) and uploads it to the required page.
Issue:
After uploading the file, I noticed that the number pages is not coming out properly. There can be up to 4000 codes (lines) in the file. The code list page in the website will show 20 lines per page, which means, I would expect at least 200 pages to be there after uploading. This is not happening. I am not sure what is the mistake that I am doing.
Also, I am new to Python (primarily SAS) and have been working on automating bits and pieces of code. One such automation is this exercise. Here, the goal is to upload multiple files to the said URL. Today the team is uploading them one by one manually. With the knowledge I picked up from tutorials and other sources, I was able to come up with this.
import requests
import json
import os
import random
import pandas as pd
import time
token = os.environ.get("USER_TOKEN")
user_id = os.environ.get("USER_ID")
user_name = os.environ.get("USER_NAME")
headers = {"X-API-Key": token}
url = 'https://XXXXXXXXXXXX.com/api/code_lists'
session=requests.session()
cl = session.get(url, headers=headers).json()
def uploading_files(file,name,kind,coding_system,rand_id):
df = pd.read_table(file, converters={0:str}, header=None)
print("Came In")
CODES = df[0].astype('str').tolist()
codes = {"codes": CODES}
new_cl = {"_id": rand_id, "name": name, "project_group": "TEST BETA", "kind": kind,
"coding_system": coding_system, "user": user_id, "creator": user_name, "creation_method": "Upload", "is_category_mapping": False,
"assoc_users": [], "global": True, "readonly": False, "description": "", "num_codes": len(CODES)}
request_json = json.dumps(new_cl)
print(request_json)
codes_json = json.dumps(codes)
print(codes_json)
session.post(url, data=request_json)
session.put(url + '/' + rand_id, data=codes_json)
text_Files= os.listdir(r'C://Users//XXXXXXXXXXXXX//data')
for i in text_Files:
if ".txt" in i:
x=i.split("_")
file='C://Users//XXXXXXXXXXXXX//data//' + i
name=""
for j in i[:-4]:
if j!="_":
name+=j
elif j=="_":
name+=" "
kind=x[2]
coding_system=x[3][:-4]
rand_id = "".join(random.choice("0123456789abcdef") for i in range(24))
print("-------------START-----------------")
print("file : ", file)
print("name : ", name)
print("kind : ", kind)
print("coding system : ", coding_system)
print("Rand_Id : ", rand_id)
uploading_files(file, name, kind, coding_system, rand_id)
time.sleep(2)
print("---------------END---------------")
print("")
break ''' to upload only 1 file in the directory'''
Example data in the file (testfile.txt)
C8900
C8901
C8902
C8903
C8904
C8905
C8906
C8907
C8908
C8909
C8910
C8911
C8912
C8913
C8914
C8918
C8919
C8920
C8921
C8922
C8923
C8924
C8925
C8926
C8927
C8928
C8929
C8930
C8931
C8932
C8933
C8934
C8935
C8936
C9723
C9744
C9762
C9763
C9803
D0260
Sample Data Snapshot
Wrong Representation
Expected

Doing feature generation in serving_input_fn for Tensorflow model

I've been playing around with BERT and TensorFlow following the example here and have a trained working model.
I then wanted to save and deploy the model, so used the export_saved_model function, which requires you build a serving_input_fn to handle any incoming requests when the model is reloaded.
I wanted to be able to pass a single string for sentiment analysis to the deployed model, rather than having a theoretical client side application do the tokenisation and feature generation etc, so tried to write an input function that would handle that and pass the constructed features to the model. Is this possible? I wrote the following which I feel should do what I want:
import json
import base64
def plain_text_serving_input_fn():
input_string = tf.placeholder(dtype=tf.string, shape=None, name='input_string_text')
# What format to expect input in.
receiver_tensors = {'input_text': input_string}
input_examples = [run_classifier.InputExample(guid="", text_a = str(input_string), text_b = None, label = 0)] # here, "" is just a dummy label
input_features = run_classifier.convert_examples_to_features(input_examples, label_list, MAX_SEQ_LENGTH, tokenizer)
variables = {}
for i in input_features:
variables["input_ids"] = i.input_ids
variables["input_mask"] = i.input_mask
variables["segment_ids"] = i.segment_ids
variables["label_id"] = i.label_id
feature_spec = {
"input_ids" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"input_mask" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"segment_ids" : tf.FixedLenFeature([MAX_SEQ_LENGTH], tf.int64),
"label_ids" : tf.FixedLenFeature([], tf.int64)
}
string_variables = json.dumps(variables)
encode_input = base64.b64encode(string_variables.encode('utf-8'))
encode_string = base64.decodestring(encode_input)
features_to_input = tf.parse_example([encode_string], feature_spec)
return tf.estimator.export.ServingInputReceiver(features_to_input, receiver_tensors)
I would expect that this would allow me to call predict on my deployed model with
variables = {"input_text" : "This is some test input"}
predictor.predict(variables)
I've tried a range of variations of this (putting it in an array, converting to base 64 etc), but I get a range of errors either telling me
"error": "Failed to process element: 0 of 'instances' list. Error: Invalid argument: JSON Value: {\n \"input_text\": \"This is some test input\"\n} not formatted correctly for base64 data" }"
or
Object of type 'bytes' is not JSON serializable
I suspect I'm formatting my requests incorrectly, but I also can't find any examples of something similar being done in a serving_input_fn, so has anyone ever done something similar?

InvalidResponse error in bigquery load_table_from_file

I am trying to upload a set of csv data into a BigQuery from a BytesIO object, but keep getting an error InvalidResponse: Response headers must contain header 'location'
Here is my code
# self.database = authenticated bigquery.Client
config = bigquery.LoadJobConfig()
config.skip_leading_rows = 1
config.source_format = bigquery.SourceFormat.CSV
config.allow_jagged_rows = True
schema = [
bigquery.SchemaField("date", "DATE", mode="REQUIRED"),
bigquery.SchemaField("page_id", "STRING", mode="REQUIRED")
]
# ... Appending a list of bigquery.SchemaField("name", "INTEGER")
config.schema = schema
table = self.get_or_create_table(name, config.schema) # returns TableReference
file = self.clip_data(local_fp, cutoff_date) # returns BytesIO
job = self.database.load_table_from_file(
file, table,
num_retries=self.options.num_retries,
job_id=uuid.uuid4().int,
job_config=config
) # Error is here.
I have tried searching around but I cannot find any reason or fix for this exception.
InvalidResponse: ('Response headers must contain header', 'location')
The problem was caused by not providing a location in the load_table_from_file method.
location="US"
was enough to fix the problem.

Python - Error querying Solarwinds N-Central via SOAP

I'm using python 3 to write a script that generates a customer report for Solarwinds N-Central. The script uses SOAP to query N-Central and I'm using zeep for this project. While not new to python I am new to SOAP.
When calling the CustomerList fuction I'm getting the TypeError: __init__() got an unexpected keyword argument 'listSOs'
import zeep
wsdl = 'http://' + <server url> + '/dms/services/ServerEI?wsdl'
client = zeep.CachingClient(wsdl=wsdl)
config = {'listSOs': 'true'}
customers = client.service.CustomerList(Username=nc_user, Password=nc_pass, Settings=config)
Per the perameters below 'listSOs' is not only a valid keyword, its the only one accepted.
CustomerList
public com.nable.nobj.ei.Customer[] CustomerList(String username, String password, com.nable.nobj.ei.T_KeyPair[] settings) throws RemoteException
Parameters:
username - MSP N-central username
password - Corresponding MSP N-central password
settings - A list of non default settings stored in a T_KeyPair[]. Below is a list of the acceptable Keys and Values. If not used leave null
(Key) listSOs - (Value) "true" or "false". If true only SOs with be shown, if false only customers and sites will be shown. Default value is false.
I've also tried passing the dictionary as part of a list:
config = []
key = {'listSOs': 'true'}
config += key
TypeError: Any element received object of type 'str', expected lxml.etree._Element or builtins.dict or zeep.objects.T_KeyPair
Omitting the Settings value entirely:
customers = client.service.CustomerList(Username=nc_user, Password=nc_pass)
zeep.exceptions.ValidationError: Missing element Settings (CustomerList.Settings)
And trying zeep's SkipValue:
customers = client.service.CustomerList(Username=nc_user, Password=nc_pass, Settings=zeep.xsd.SkipValue)
zeep.exceptions.Fault: java.lang.NullPointerException
I'm probably missing something simple but I've been banging my head against the wall off and on this for awhile I'm hoping someone can point me in the right direction.
Here's my source code from my getAssets.py script. I did it in Python2.7, easily upgradeable though. Hope it helps someone else, N-central's API documentation is really bad lol.
#pip2.7 install zeep
import zeep, sys, csv, copy
from zeep import helpers
api_username = 'your_ncentral_api_user'
api_password='your_ncentral_api_user_pw'
wsdl = 'https://(yourdomain|tenant)/dms2/services2/ServerEI2?wsdl'
client = zeep.CachingClient(wsdl=wsdl)
response = client.service.deviceList(
username=api_username,
password=api_password,
settings=
{
'key': 'customerId',
'value': 1
}
)
# If you can't tell yet, I code sloppy
devices_list = []
device_dict = {}
dev_inc = 0
max_dict_keys = 0
final_keys = []
for device in response:
# Iterate through all device nodes
for device_properties in device.items:
# Iterate through each device's properties and add it to a dict (keyed array)
device_dict[device_properties.first]=device_properties.second
# Dig further into device properties
device_properties = client.service.devicePropertyList(
username=api_username,
password=api_password,
deviceIDs=device_dict['device.deviceid'],
reverseOrder=False
)
prop_ind = 0 # This is a hacky thing I did to make my CSV writing work
for device_node in device_properties:
for prop_tree in device_node.properties:
for key, value in helpers.serialize_object(prop_tree).items():
prop_ind+=1
device_dict["prop" + str(prop_ind) + "_" + str(key)]=str(value)
# Append the dict to a list (array), giving us a multi dimensional array, you need to do deep copy, as .copy will act like a pointer
devices_list.append(copy.deepcopy(device_dict))
# check to see the amount of keys in the last item
if len(devices_list[-1].keys()) > max_dict_keys:
max_dict_keys = len(devices_list[-1].keys())
final_keys = devices_list[-1].keys()
print "Gathered all the datas of N-central devices count: ",len(devices_list)
# Write the data out to a CSV
with open('output.csv', 'w') as csvfile:
fieldnames = final_keys
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for csv_line in devices_list:
writer.writerow(csv_line)

How to compress warc records with lzma (*.warc.xz) in python3?

I have a list of warc records. Every single item in list is created like this:
header = warc.WARCHeader({
"WARC-Type": "response",
"WARC-Target-URI": "www.somelink.com",
}, defaults=True)
data = "Some string"
record = warc.WARCRecord(header, data.encode('utf-8','replace'))
Now, I am using *.warc.gz to store my records like this:
output_file = warc.open("my_file.warc.gz", 'wb')
And write records like this:
output_file.write_record(record) # type of record is WARCRecord
But how can I compress with lzma as *.warc.xz? I have tried replacing gz with xz when callig warc.open, but warc in python3 do not support this format. I have found this trial, but I was not able to save WARCRecord with this:
output_file = lzma.open("my_file.warc.xz", 'ab', preset=9)
header = warc.WARCHeader({
"WARC-Type": "response",
"WARC-Target-URI": "www.somelink.com",
}, defaults=True)
data = "Some string"
record = warc.WARCRecord(header, data.encode('utf-8','replace'))
output_file.write(record)
The error message is:
TypeError: a bytes-like object is required, not 'WARCRecord'
Thanks for any help.
The WARCRecord class has a write_to method, to write records to a file object.
You could use that to write records to a file created with lzma.open().

Resources