I am trying to use Microsoft Azure Form Recognizer API to upload Invoice pdf and get table info inside it.
I was able to make a successful POST request.
But not able to train the model and getting an error that 'No valid blobs found in the specified Azure blob container. Please conform to the document format/size/page/dimensions requirements.'.
But I have more than 5 files in a blob storage container.
I have also provided the shared key for the blob container. You can find the code I have written and the error attached.
"""
Created on Thu Feb 20 16:22:41 2020
#author: welcome
"""
########## Python Form Recognizer Labeled Async Train #############
import json
import time
from requests import get, post
# Endpoint URL
endpoint = r"https://sctesting.cognitiveservices.azure.com"
post_url = endpoint + r"/formrecognizer/v2.0-preview/custom/models"
print(post_url)
source = '<source url from blob storage>'
prefix = "name of the folder"
includeSubFolders = False
useLabelFile = False
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': '<key>',
}
body = {
"source": source,
"sourceFilter": {
"prefix": prefix,
"includeSubFolders": includeSubFolders
},
"useLabelFile": useLabelFile
}
try:
resp = post(url = post_url, json = body, headers = headers)
if resp.status_code != 201:
print("POST model failed (%s):\n%s" % (resp.status_code, json.dumps(resp.json())))
quit()
print("POST model succeeded:\n%s" % resp.headers)
get_url = resp.headers["location"]
except Exception as e:
print("POST model failed:\n%s" % str(e))
quit()
n_tries = 15
n_try = 0
wait_sec = 3
max_wait_sec = 60
while n_try < n_tries:
try:
resp = get(url = get_url, headers = headers)
resp_json = resp.json()
if resp.status_code != 200:
print("GET model failed (%s):\n%s" % (resp.status_code, json.dumps(resp_json)))
quit()
model_status = resp_json["modelInfo"]["status"]
if model_status == "ready":
print("Training succeeded:\n%s" % json.dumps(resp_json))
quit()
if model_status == "invalid":
print("Training failed. Model is invalid:\n%s" % json.dumps(resp_json))
quit()
# Training still running. Wait and retry.
time.sleep(wait_sec)
n_try += 1
wait_sec = min(2*wait_sec, max_wait_sec)
except Exception as e:
msg = "GET model failed:\n%s" % str(e)
print(msg)
quit()
print("Train operation did not complete within the allocated time.")
output got in Anaconda prompt by running the above code
POST model succeeded:
{'Content-Length': '0', 'Location': 'https://sctesting.cognitiveservices.azure.com/formrecognizer/v2.0-preview/custom/models/30b7d99b-fc57-466d-a59b-c0d9738c03ac', 'x-envoy-upstream-service-time': '379', 'apim-request-id': '18cbec13-8129-45de-8685-83554e8b35d4', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'x-content-type-options': 'nosniff', 'Date': 'Thu, 20 Feb 2020 19:35:47 GMT'}
Training failed. Model is invalid:
{"modelInfo": {"modelId": "30b7d99b-fc57-466d-a59b-c0d9738c03ac", "status": "invalid", "createdDateTime": "2020-02-20T19:35:48Z", "lastUpdatedDateTime": "2020-02-20T19:35:50Z"}, "trainResult": {"trainingDocuments": [], "errors": [{"code": "2014", "message": "No valid blobs found in the specified Azure blob container. Please conform to the document format/size/page/dimensions requirements."}]}}
if you use the from recognizer labeling tool to do the same thing, would that work? have you put the files in the root directory of the Azure blob, or in a sub directory?
Make sure that the files in your blob storage container fit the requirements here: https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/overview#custom-model
If your files look fine, also check what kind of SAS token you are using. The error message you have can occur if you are using a policy defined SAS token, in which case, try switching to a SAS token with explicit permissions as detailed here: https://stackoverflow.com/a/60235222/12822344
You didnt specify a source. You need to generate a Shared Access Signature (SAS) when you're in the menu of the selected storage account. If you have a container in that storage account, you'll need to include your container name in the URL. EX. If you have a container named "train":
"www....windows.net/?sv=....." ---> "www....windows.net/train?sv=......".
Otherwise you can try to use the "prefix" string, but I found it buggy.
Also, you have not included your subscription key.
https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/python-train-extract
Try removing the Source Filter from the body. It should work.
Related
I created a resource on Microsoft Azure Face with F0 tier, then wrote the code below:
import requests
BASE_URL = "https://xxxxxxxxxxxxxxx.cognitiveservices.azure.com/face/v1.0/detect"
HEADERS = {
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': "[My Key]"
}
PARAMS = {
'returnFaceLandmarks': 'true',
'returnFaceId': 'true',
'detectionModel': 'detection_03',
}
url = "https://u.cubeupload.com/Johann/test1.jpg"
data = {'url': url}
r = requests.post(BASE_URL, data=data, headers=HEADERS, params=PARAMS)
print(r.json())
Although I expected it to return face information, it instead returned:
{'error': {'code': 'InvalidRequest', 'message': 'Invalid request has been sent.', 'innererror': {'code': 'UnsupportedFeature', 'message': 'Feature is not supported, missing approval for one or more of the following features: Identification,Verification. Please apply for access at https://aka.ms/facerecognition'}}}
Therefore, I set the 'returnFaceId' attribute to 'false'. But then I was shown
{'error': {'code': 'BadArgument', 'message': 'JSON parsing error.'}}
I would like to know where my mistake is and how should I correct it. Here is some other information about my resource.
Status: Active
Region: East US
API Type: Face
Pricing Tier: F0 (Free)
My Access: Owner
Limited Access Approval: None
Here i have a work around where instead of using rest Api you can use faceSDK to detect the faces.
you have to create your face client and then use the detect_with_url function to detect the faces.
Here I have created a console app which will print the result after detection.
from azure.cognitiveservices.vision.face import FaceClient
from msrest.authentication import CognitiveServicesCredentials
endpoint = ""
key = ""
image_url = ""
face_client = FaceClient(endpoint, CognitiveServicesCredentials(key))
detected_faces = face_client.face.detect_with_url(url=image_url, detection_model='detection_03' ,return_face_landmarks = True , returnFaceId = True )
print(detected_faces)
My endpoint is also in EASTUS and FO price tier
output:
I have built a function in Python3 that will retrieve the data from a Google Spreadsheet. The data will contain the Recording's Download_URL and other information.
The function will download the Recording and store it in the local machine. Once the video is saved, the function will upload it to Google Drive using the Resumable Upload method.
Even though the response from the Resumable Upload method is 200 and it also gives me the Id of the file, I can't seem to find the file anywhere in my Google Drive. Below is my code.
import os
import requests
import json
import gspread
from oauth2client.service_account import ServiceAccountCredentials
DOWNLOAD_DIRECTORY = 'Parent_Folder'
def upload_recording(file_location,file_name):
filesize = os.path.getsize(file_location)
# Retrieve session for resumable upload.
headers = {"Authorization": "Bearer " + access_token, "Content-Type": "application/json"}
params = {
"name": file_name,
"mimeType": "video/mp4"
}
r = requests.post(
"https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable",
headers=headers,
data=json.dumps(params)
)
print(r)
location = r.headers['Location']
# Upload the file.
headers = {"Content-Range": "bytes 0-" + str(filesize - 1) + "/" + str(filesize)}
r = requests.put(
location,
headers=headers,
data=open(file_location, 'rb')
)
print(r.text)
return True
def download_recording(download_url, foldername, filename):
upload_success = False
dl_dir = os.sep.join([DOWNLOAD_DIRECTORY, foldername])
full_filename = os.sep.join([dl_dir, filename])
os.makedirs(dl_dir, exist_ok=True)
response = requests.get(download_url, stream=True)
try:
with open(full_filename, 'wb') as fd:
for chunk in response.iter_content(chunk_size=512 * 1024):
fd.write(chunk)
upload_success = upload_recording(full_filename,filename)
return upload_success
except Exception as e:
# if there was some exception, print the error and return False
print(e)
return upload_success
def main():
scope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/drive.file", "https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name('creds.json', scope)
client = gspread.authorize(creds)
sheet = client.open("Zoom Recordings Data").sheet1
data = sheet.get_all_records()
# Get the Recordings information that are needed to download
for index in range(len(sheet.col_values(9))+1 ,len(data)+2):
success = False
getRow = sheet.row_values(index)
session_name = getRow[0]
topic = getRow[1]
topic = topic.replace('/', '')
topic = topic.replace(':', '')
account_name = getRow[2]
start_date = getRow[3]
file_size = getRow[4]
file_type = getRow[5]
url_token = getRow[6] + '?access_token=' + getRow[7]
file_name = start_date + ' - ' + topic + '.' + file_type.lower()
file_destination = session_name + '/' + account_name + '/' + topic
success |= download_recording(url_token, file_destination, file_name)
# Update status on Google Sheet
if success:
cell = 'I' + str(index)
sheet.update_acell(cell,'success')
if __name__ == "__main__":
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'creds.json',
scopes='https://www.googleapis.com/auth/drive'
)
delegated_credentials = credentials.create_delegated('Service_Account_client_email')
access_token = delegated_credentials.get_access_token().access_token
main()
I'm still trying to figure out how to upload the video to the folder that it needs to be. I'm very new to Python and the Drive API. I would very appreciate if you can give me some suggestions.
How about this answer?
Issue and solution:
Even though the response from the Resumable Upload method is 200 and it also gives me the Id of the file, I can't seem to find the file anywhere in my Google Drive. Below is my code.
I think that your script is correct for the resumable upload. From your above situation and from your script, I understood that your script worked, and the file has been able to be uploaded to Google Drive with the resumable upload.
And, when I saw your issue of I can't seem to find the file anywhere in my Google Drive and your script, I noticed that you are uploading the file using the access token retrieved by the service account. In this case, the uploaded file is put to the Drive of the service account. Your Google Drive is different from the Drive of the service account. By this, you cannot see the uploaded file using the browser. In this case, I would like to propose the following 2 methods.
Pattern 1:
The owner of the uploaded file is the service account. In this pattern, share the uploaded file with your Google account. The function upload_recording is modified as follows. And please set your email address of Google account to emailAddress.
Modified script:
def upload_recording(file_location, file_name):
filesize = os.path.getsize(file_location)
# Retrieve session for resumable upload.
headers1 = {"Authorization": "Bearer " + access_token, "Content-Type": "application/json"} # Modified
params = {
"name": file_name,
"mimeType": "video/mp4"
}
r = requests.post(
"https://www.googleapis.com/upload/drive/v3/files?uploadType=resumable",
headers=headers1, # Modified
data=json.dumps(params)
)
print(r)
location = r.headers['Location']
# Upload the file.
headers2 = {"Content-Range": "bytes 0-" + str(filesize - 1) + "/" + str(filesize)} # Modified
r = requests.put(
location,
headers=headers2, # Modified
data=open(file_location, 'rb')
)
# I added below script.
fileId = r.json()['id']
permissions = {
"role": "writer",
"type": "user",
"emailAddress": "###" # <--- Please set your email address of your Google account.
}
r2 = requests.post(
"https://www.googleapis.com/drive/v3/files/" + fileId + "/permissions",
headers=headers1,
data=json.dumps(permissions)
)
print(r2.text)
return True
When you run above modified script, you can see the uploaded file at "Shared with me" of your Google Drive.
Pattern 2:
In this pattern, the file is uploaded to the shared folder using the resumable upload with the service account. So at first, please prepare a folder in your Google Drive and share the folder with the email of the service account.
Modified script:
Please modify the function upload_recording as follows. And please set the folder ID you shared with the service account.
From:
params = {
"name": file_name,
"mimeType": "video/mp4"
}
To:
params = {
"name": file_name,
"mimeType": "video/mp4",
"parents": ["###"] # <--- Please set the folder ID you shared with the service account.
}
When you run above modified script, you can see the uploaded file at the shared folder of your Google Drive.
Note:
In this case, the owner is the service account. Of course, the owner can be also changed.
Reference:
Permissions: create
Suppose there's a web page that allows a user to upload a file. The file can be of any format/size.
There's a REST API based Flask Server running on an EC2 instance. In my case it's CentoOS 7+.
Once I receive the file in the HTTP request, I first need to download them on the CentOS server. After the file is downloaded, I would have to upload it to Azure File Storage.
I can do all of the above without any problem.
However, is there a way where I can upload the file in the incoming request to Azure servers without downloadiing it on CentOS server.
Below is what I have done so far.
app.py
# Upserts uploads folder
base_path_uploads = os.path.dirname(__file__) + '/uploads'
def create_directories(token):
# Create a directory specific to the logged in client. All the call recording clips will be stored under it.
path_to_tokenized_directories = (base_path_uploads + "/" + token)
os.makedirs(path_to_tokenized_directories, exist_ok=True)
return path_to_tokenized_directories
#app.route('/user/upload', methods=['POST'])
def upload():
# First, we shall fetch the token and see if a directory corresponding to its value exists
try:
####This will be where I can receive one or more than one #files
file = list(request.files['files'])
token = request.form['token']
# Path where user specific files are uploaded/deleted
uploads_path = create_directories(token)
# Save file on the current server
if file and allowed_file(file.filename):
filename = secure_filename(file.filename)
file.save(os.path.join(app.config['UPLOAD_FOLDER'] + "/" + token, filename))
files = file_share_crud_ops.azure_crud(token, "upload", uploads_path)
api_response = jsonify({
"status":
"success",
"message":
"",
"data": {
"files": files
}
})
api_response.status_code = 200
except Exception as e:
api_response = jsonify({
"status":
"failure",
"message":
"Unable to fetch your workspace data"
})
api_response.status_code = 500
return api_response
file_share_crud_ops
def azure_crud(self, token, operation, user_directory_path):
try:
# Get the directory client
user_workspace = share.get_directory_client(directory_path=token)
# [START create_directory]
try:
user_workspace.create_directory()
except:
# Directory has already been created
pass
if "read" in operation:
files_and_folders = list(share.list_directories_and_files(directory_name=token))
for files_and_folder in files_and_folders:
files_and_folder['id'] = "1"
files_and_folder['type'] = "file"
files_and_folder['owner'] = "me"
files_and_folder['modified'] = ""
files_and_folder['opened'] = ""
files_and_folder['created'] = ""
files_and_folder['extention'] = ""
files_and_folder['location'] = ""
files_and_folder['offline'] = True
return files_and_folders
elif "upload" in operation:
# Upload to Azure Files
for filename in os.listdir(user_directory_path):
with open(filename, "rb") as source:
user_workspace.upload_file(file_name="visualization_-_aerial", data=source)
# List files in the directory
files_and_folders = list(share.list_directories_and_files(directory_name=token))
return files_and_folders
except Exception as e:
raise ValueError("Error while Creating or Fetching share content")
So I'm trying to follow The microsoft face api documentation here for the "FindSimilar" feature. There is an example at the bottom of the page where I use this code:
########### Python 3.2 #############
import http.client, urllib.request, urllib.parse, urllib.error, base64
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': '{api key}',
}
params = urllib.parse.urlencode({
})
try:
conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')
conn.request("POST", "/face/v1.0/findsimilars?%s" % params, "{body}",
headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))
I'm getting an error where it tells me my subscription key is invalid, but I checked my azure account status and I see no issues:
b'\n\t\t\t\t\t{"error":{"code":"Unspecified","message":"Access denied due to invalid subscription key. Make sure you are subscribed to an API you are trying to call and provide the right key."}}\n \t\t'
Access denied due to invalid subscription key. Make sure you are subscribed to an API you are trying to call and provide the right key.
It indicates that Invalid subscription Key or user/plan is blocked. I recommand that you could check the APi Key.
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': '3c658abc64b348c1a5...',
}
I am trying to analyze a video via Emotion API by Microsoft using Python 3.2
I am encountering the following error:
b'{ "error": { "code": "Unauthorized", "message": "Access denied due to invalid subscription key. Make sure you are subscribed to an API you are trying to call and provide the right key." } }'
I am using Emotion API subscription key (i have also used the Face API key, and computer vision key just in case).
Code:
import http.client, urllib.request, urllib.parse, urllib.error, base64
headers = {
# Request headers
'Ocp-Apim-Subscription-Key': '{subscription key}',
}
params = urllib.parse.urlencode({
})
try:
conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')
conn.request("GET", "/emotion/v1.0/operations/{oid}?%s" % params, "{body}", headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))
Your code works. Just make sure you wait 10 minutes after generating the API key so that it starts working (it says so in the Azure Portal).
Also, in general for Cognitive Services, make sure that the API key you have corresponds to the region you're trying to hit (West US, etc.)