How do I use Mindee API with Python3? - python-3.x

I'm just playing about with code and interested in parsing receipts to text (end point csv file). I came across this tutorial on mindee API where they also provide code to run the parsing. However, I keep getting the below errors when attempting to parse.
import requests
url = "https://api.mindee.net/v1/products/mindee/expense_receipts/v3/predict"
with open("/Users/test/PycharmProjects/PythonCrashCourse", "rb") as myfile: # Here they mention to specify the PATH to file/files which is here as per my windows10 path.
files = {"IMG_5800.jpg": myfile}
headers = {"Authorization": "Token asdasd21321"}
response = requests.post(url, files=files, headers=headers)
print(response.text)
PermissionError: [Errno 13] Permission denied: '/Users/test/PycharmProjects/PythonCrashCourse'
Why is there permission denied? When I am admin and have full permissions enabled on the file iteself.
I have also tried modifying the code and running the below;
import requests
url = "https://api.mindee.net/v1/products/mindee/expense_receipts/v3/predict"
imageFile = "IMG_5800.jpg" #File is in the current directory
files = {"file": open(imageFile, "rb")}
headers = {"Authorization": "Token a4342343c925a"}
response = requests.post(url, files=files, headers=headers)
print(response.text)
#output
{"api_request":{"error":{"code":"BadRequest","details":{"document":["Missing data for required field."],"file":["Unknown field."]},"message":"Invalid fields in form"},"resources":[],"status":"failure","**status_code":400**,"url":"http://api.mindee.net/v1/products/mindee/expense_receipts/v3/predict"}}
Process finished with exit code 0
Status code 400 - suggests something has gone wrong with the syntax....Unfortunately I am stuck and simply just want the API to parse my receipt. Any ideas on what is going wrong please?
Desired output:
get results from receipt in text format/json from Mindee API
References Used:
https://medium.com/mindeeapi/extract-receipt-data-with-mindees-api-using-python-7ee7303f4b6d tutorial on Mindee API
https://platform.mindee.com/products/mindee/expense_receipts?setup=default#documentation

From the error message, it was stated that the document was missing.
I'm glad you found the solution to this.
However, following the documentation, there is an improved code, the authentication header X-Inferuser-Token has been deprecated.
You can try doing this instead
import requests
url = "https://api.mindee.net/v1/products/mindee/expense_receipts/v3/predict"
with open("./IMG_5800.jpg", "rb") as myfile:
files = {"document": myfile}
headers = {"Authorization": "Token my-api-key-here"}
response = requests.post(url, files=files, headers=headers)
print(response.text)

After brushing up on HTML format - https://www.codegrepper.com/code-examples/html/HTML+file+path. I realised the path I used was wrong and should've used the correct HTML format whether I am on Windows/Mac.
To resolve my issue, I mentioned to go 1 directory up to where the image file is, when running my code.
with open("./IMG_5800.jpg", "rb") as myfile: #modified here to go 1 directory up to where the image file is hosted
files = {"file": myfile}
headers = {"X-Inferuser-Token": "Token my-api-key-here"}
response = requests.post(url, files=files, headers=headers)

Related

HTTP 405 when making a Python PUT request

I have a requirement to make a PUT request from Python and I have been getting a HTTP 405 response code consistently. Any pointers to the code below would be great.
filepath = './sdfdd/sdfdsst/xxxxxxxxxrrrarara.json'
with open(filepath) as fh:
mydata = fh.read()
response = requests.put('https://asdfs.sdf.sdfds.com',
data=mydata,
auth=('Authorization', 'Api-Token dsdfdsfsdfsdf'),
headers={'content-type':'application/json'},
params={'file': filepath},
allow_redirects=True
)
print(response)
It was due to incorrect API endpoint which was causing this issue.

python download file into memory and handle broken links

I'm using the following code to download a file into memory :
if 'login_before_download' in obj.keys():
with requests.Session() as session:
session.post(obj['login_before_download'], verify=False)
request = session.get(obj['download_link'], allow_redirects=True)
else:
request = requests.get(obj['download_link'], allow_redirects=True)
print("downloaded {} into memory".format(obj[download_link_key]))
file_content = request.content
obj is a dict that contains the download_link and another key that indicates if I need to login to the page to create a cookie.
The problem with my code is that if the url is broken and there isnt any file to download I'm still getting the html content of the page instead of identifying that the download failed.
Is there any way to identify that the file wasnt downloaded ?
I found the following solution in this url :
import requests
def is_downloadable(url):
"""
Does the url contain a downloadable resource
"""
h = requests.head(url, allow_redirects=True)
header = h.headers
content_type = header.get('content-type')
if 'text' in content_type.lower():
return False
if 'html' in content_type.lower():
return False
return True
print is_downloadable('https://www.youtube.com/watch?v=9bZkp7q19f0')
# >> False
print is_downloadable('http://google.com/favicon.ico')
# >> True

How to fix os.path error in windows machine?

I am trying to write a function that will send email with or without attachment based on the attachment file input but its failing with file not found error.
Here is my function
def sendEmail(TO, FROM, SUBJECT, BODY, *FILETOSEND):
"""Function to send email"""
# Create message container - the correct MIME type is multipart/alternative.
msg = MIMEMultipart("alternative")
msg["Subject"] = SUBJECT
msg["From"] = FROM
msg["To"] = TO
if FILETOSEND:
file_string = str(FILETOSEND)
finalFile = file_string[1 : len(file_string) - 2]
fp = open(finalFile)
attachment = MIMEText(fp.read())
fp.close()
attachment.add_header("Content-Disposition", "attachment", filename=fp)
msg.attach(attachment)
# Create the body of the message
text = BODY
part1 = MIMEText(text, "plain")
msg.attach(part1)
# Send the message via local SMTP server.
s = smtplib.SMTP("smtpservername")
s.sendmail(msg["From"], msg["To"], msg.as_string())
s.quit()
fileToSend = r"C:\Users\n123456\Desktop\DomainFolder\D1\NEWDATASET.txt"
sendEmail(TO, FROM, SUBJECT, BODY, *FILETOSEND)
I have passed all the arguments without filetosend and it worked but when passing with fileToSend its failing with OSerror
OSError: [Errno 22] Invalid argument:
"'C:\\\\Users\\\\n123456\\\\Desktop\\\\DomainFolder\\\\D1\\\\NEWDATASET.txt'"
I have tested with placing file in different directory and drive also with putting forward slash but still same issue.
As *FILETOSEND will return tuple I am trying string manipulation to make it correct path but no luck.
I am using windows 10 with Python 3.8. Seeking help.
Try to use path like this
open(r"C:\Users\n123456\Desktop\DomainFolder\D1\NEWDATASET.txt","r")

Unable to get the response in POST method in Python

I am facing a unique problem.
Following is my code.
url = 'ABCD.com'
cookies={'cookies':'xyz'}
r = requests.post(url,cookies=cookies)
print(r.status_code)
json_data = json.loads(r.text)
print("Printing = ",json_data)
When I use the url and cookie in the POSTMAN tool and use POST request I get JSON response . But when I use the above code with POST request method in python I get
404
Printing = {'title': 'init', 'description': "Error: couldn't find a device with id: xxxxxxxxx in ABCD: d1"}
But when I use the following code i .e with GET request method
url = 'ABCD.com'
cookies={'cookies':'xyz'}
r = requests.post(url,cookies=cookies)
print(r.status_code)
json_data = json.loads(r.text)
print("Printing = ",json_data)
I get
200
Printing = {'apiVersion': '0.4.0'}
I am not sure why POST method works with JSON repsone in POSTMAN tool and when I try using python it is not work. I use latest python 3.6.4
I finally found what was wrong following is correct way
url = 'ABCD.com'
cookies={'cookies':'xyz'}
r = requests.post(url,headers={'Cookie'=cookies)
print(r.status_code)
json_data = json.loads(r.text)
print("Printing = ",json_data)
web page was expecting headers as cookie and i got the response correctly

Google API Python Client: MediaIoBaseDownload: Problems with 'contentEncoding' of type 'gzip'

I am running in circles trying to figure out how to download a CSV file that is 'contentEncoded' as 'gzip' from Google cloud using their google-api-python-client.
My issue, I am not able to download a file that has 'contentEncoding' as 'gzip', nor its 'md5Hash' matching what was downloaded, nor its 'size' matching what was downloaded (much larger).
This is the object's metadata:
{
'selfLink':'https://www.googleapis.com/storage/v1/b/pubsite_prod_rev_0123456789/o/stats%2Finstalls%2Finstalls_com.foobar.helloworld_201512_country.csv',
'etag':'ETAG=',
'mediaLink':'https://www.googleapis.com/download/storage/v1/b/pubsite_prod_rev_0123456789/o/stats%2Finstalls%2Finstalls_com.foobar.helloworld_201512_country.csv?generation=1451747575795000&alt=media',
'id':'pubsite_prod_rev_0123456789/stats/installs/installs_com.foobar.helloworld_201512_country.csv/1451747575795000',
'name':'stats/installs/installs_com.foobar.helloworld_201512_country.csv',
'contentType':'text/csv; charset=utf-16le',
'contentEncoding':'gzip',
'size':9260,
'md5Hash':'MD5HASH==',
'kind':'storage#object',
'crc32c':'CRC32C==',
'storageClass':'STANDARD'
}
When I download object's media, and it has 2 problems:
metadata md5Hash value does not match download's calculated using md5(data).hexdigest().
metadata size (9260) does not match download size (288386).
Here is the code:
request = service.objects().get_media(
bucket=bucket_name,
object=object_name
)
with io.BytesIO() as compressed_file:
downloader = MediaIoBaseDownload(compressed_file, request, chunksize=1024*1024)
progressless_iters = 0
done = False
while not done:
error = None
try:
progress, done = downloader.next_chunk()
if progress:
self.logger.info(
'Download %d%%.' % int(progress.progress() * 100)
)
except HttpError as err:
error = err
if err.resp.status < 500:
raise
except RETRYABLE_ERRORS as err:CSV file that is 'contentEncoded' as 'gzip'
error = err
if error:
progressless_iters += 1
self._HandleProgresslessIter(error, progressless_iters)
else:
progressless_iters = 0
self.logger.info('\nDownload complete!')
data = compressed_file.getvalue()
original_md5 = src_obj_metadata['md5Hash']
md5_returned = md5(data).hexdigest()
# Finally compare original MD5 with freshly calculated
if original_md5 == md5_returned:
logger.info("MD5 verified.")
else:
logger.info("MD5 verification failed!.")
with open(download_file_name, 'wb') as fh_compressed:
fh_compressed.write(data)
If I try to check the 'md5Hash' after downloaded file is closed, it errors:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
Here is that code:
# Open,close, read file and calculate MD5 on its contents
with open(download_file_name) as file_to_check:
# read contents of the file
data = file_to_check.read()
# pipe contents of the file through
md5_returned = md5(data).hexdigest()
If I try to decompress the downloaded file, it errors:
OSError: Not a gzipped file
Here is the code:
with open(download_file_name, 'rb') as fh_compressed:
with open(csv_file_name, 'wb') as fh_decompressed:
fh_decompressed.write(gzip.decompress(fh_compressed.read()))
What am I doing wrong in order to properly download a CSV file that is 'contentEncoded' as 'gzip'?
Thank you, much appreciated.
I dropped the approach in using google-api-python-client, because I found no way required to set header with 'Accept-Encoding': 'gzip, deflate' other than modifying said library.
Instead, I had found 2 solutions:
Reading PlayStore csv review files from Google storage bucket using Java App Engine
Using Google Cloud HTTP/REST API and set header with 'Accept-Encoding': 'gzip, deflate'.
I used the second solution.

Resources