Access Google Cloud Natural Language with Google Colab - nlp

I am attempting to use Google's Cloud Natural Language API with Google Colab.
I started by following Google's simple example: https://cloud.google.com/natural-language/docs/samples/language-entity-sentiment-text#language_entity_sentiment_text-python
So, my Colab notebook was literally just one code cell:
from google.cloud import language_v1
client = language_v1.LanguageServiceClient()
text_content = 'Grapes are good. Bananas are bad.'
# Available types: PLAIN_TEXT, HTML
type_ = language_v1.types.Document.Type.PLAIN_TEXT
# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
document = {"content": text_content, "type_": type_, "language": language}
# Available values: NONE, UTF8, UTF16, UTF32
encoding_type = language_v1.EncodingType.UTF8
response = client.analyze_entity_sentiment(request = {'document': document, 'encoding_type': encoding_type})
That resulted in several error messages, which I seemed to resolve, mostly with the help of this SO post, by slightly updating the code as follows:
from google.cloud import language_v1
client = language_v1.LanguageServiceClient()
text_content = 'Grapes are good. Bananas are bad.'
# Available types: PLAIN_TEXT, HTML
type_ = language_v1.types.Document.Type.PLAIN_TEXT
# Optional. If not specified, the language is automatically detected.
# For list of supported languages:
# https://cloud.google.com/natural-language/docs/languages
language = "en"
#document = {"content": text_content, "type_": type_, "language": language} ## "type_" is not valid???
document = {"content": text_content, "type": type_, "language": language}
# Available values: NONE, UTF8, UTF16, UTF32
#encoding_type = language_v1.EncodingType.UTF8 ## Does not seem to work
encoding_type = "UTF8"
#response = client.analyze_entity_sentiment(request = {'document': document, 'encoding_type': encoding_type}) ## remove request
response = client.analyze_entity_sentiment( document = document, encoding_type = encoding_type )
Which, after 10 excruciating minutes, results in the following error:
_InactiveRpcError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
72 try:
---> 73 return callable_(*args, **kwargs)
74 except grpc.RpcError as exc:
11 frames
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Getting metadata from plugin failed with error: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7f68cee39a90>)"
debug_error_string = "{"created":"#1648840699.964791285","description":"Getting metadata from plugin failed with error: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7f68cee39a90>)","file":"src/core/lib/security/credentials/plugin/plugin_credentials.cc","file_line":91,"grpc_status":14}"
>
The above exception was the direct cause of the following exception:
ServiceUnavailable Traceback (most recent call last)
ServiceUnavailable: 503 Getting metadata from plugin failed with error: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7f68cee39a90>)
The above exception was the direct cause of the following exception:
RetryError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/six.py in raise_from(value, from_value)
RetryError: Deadline of 600.0s exceeded while calling functools.partial(<function _wrap_unary_errors.<locals>.error_remapped_callable at 0x7f68cedb69e0>, document {
type: PLAIN_TEXT
content: "Grapes are good. Bananas are bad."
language: "en"
}
encoding_type: UTF8
, metadata=[('x-goog-api-client', 'gl-python/3.7.13 grpc/1.44.0 gax/1.26.3 gapic/1.2.0')]), last exception: 503 Getting metadata from plugin failed with error: ("Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Enginemetadata service. Status: 404 Response:\nb''", <google.auth.transport.requests._Response object at 0x7f68cee39a90>)
Can you please help me with this simple "Hello world!" for Cloud Natural Language with Google Colab?
My hunch is that I need to create a service account and somehow provide that key file to Colab, like this SO answer. If so, can you hold my hand a little more and tell me how I would implement that in Colab (vs. running locally)? I am new to Colab.

This appears to have worked:
Start by creating a service account, generating a key file, and saving the JSON file locally. https://console.cloud.google.com/iam-admin/serviceaccounts
(I would still love to know which, if any roles I should select in the service account generation: "Grant this service account access to project")
Cell 1: Upload a json file with my service account keys
from google.colab import files
uploaded = files.upload()
Cell 2:
from google.oauth2 import service_account
from google.cloud import language_v1
client = language_v1.LanguageServiceClient.from_service_account_json("my-super-important-gcp-key-file.json")
Cell 3:
text_content = 'Grapes are good. Bananas are bad.'
type_ = language_v1.types.Document.Type.PLAIN_TEXT
language = "en"
document = {"content": text_content, "type": type_, "language": language}
encoding_type = "UTF8"
response = client.analyze_entity_sentiment( document = document, encoding_type = encoding_type )
response
Here is the output:
entities {
name: "Grapes"
type: OTHER
salience: 0.8335162997245789
mentions {
text {
content: "Grapes"
}
type: COMMON
sentiment {
magnitude: 0.800000011920929
score: 0.800000011920929
}
}
sentiment {
magnitude: 0.800000011920929
score: 0.800000011920929
}
}
entities {
name: "Bananas"
type: OTHER
salience: 0.16648370027542114
mentions {
text {
content: "Bananas"
begin_offset: 17
}
type: COMMON
sentiment {
magnitude: 0.699999988079071
score: -0.699999988079071
}
}
sentiment {
magnitude: 0.699999988079071
score: -0.699999988079071
}
}
language: "en"
I am certain that I have just violated all sorts of security protocols. So, please, I welcome any advice for how I should improve this process.

Related

Overwrite django rest default validation errors handler

I am using django-rest for my back-end and want to overwrite default errors for fields.
My current code looks like this.
class DeckSerializer(serializers.ModelSerializer):
class Meta:
model = Product
fields = (
"id",
"title",
"get_absolute_url",
"description",
"price",
"image",
"category_id",
"category",
"title"
)
extra_kwargs = {
'title': {"error_messages": {"required": "Title cannot be empty"}},
'image': {"error_messages": {"required": "Image cannot be empty"},}
}
After writing these 2 kwargs i realised i would just be repeating something that could be solved by code.
By default the serializer validation returns this when the field is missing {title:"This field is required"}.
Is there any way that i can overwrite the current message so it can display directly the name_of_the_field + my_message . Example {title: Title is required}
I am not looking on how to write custom error message for a single field , im looking on how to write generic costum messages for every field that for example is missing or null.
We can achieve it by writing a custom exception handler.
Here is how a custom response might look like:
{
"status_code": 400,
"type": "ValidationError",
"message": "Bad request syntax or unsupported method",
"errors": [
"username: This field may not be null.",
"email: This field may not be null.",
"ticket number: This field may not be null."
]
}
We have to create a file: exception_handler.py in our project directory with the code that follows; I use utils for this kind of purposes. You can also put this code anywhere you like, but I prefer to have it in a separated file dedicated for this purpose.
from http import HTTPStatus
from rest_framework import exceptions
from rest_framework.views import Response, exception_handler
def api_exception_handler(exception: Exception, context: dict) -> Response:
"""Custom API exception handler."""
# Call REST framework's default exception handler first,
# to get the standard error response.
response = exception_handler(exception, context)
# Only alter the response when it's a validation error
if not isinstance(exception, exceptions.ValidationError):
return response
# It's a validation error, there should be a Serializer
view = context.get("view", None)
serializer = view.get_serializer_class()()
errors_list = []
for key, details in response.data.items():
if key in serializer.fields:
label = serializer.fields[key].label
help_text = serializer.fields[key].help_text
for message in details:
errors_list.append("{}: {}".format(label, message))
elif key == "non_field_errors":
for message in details:
errors_list.append(message)
else:
for message in details:
errors_list.append("{}: {}".format(key, message))
# Using the description's of the HTTPStatus class as error message.
http_code_to_message = {v.value: v.description for v in HTTPStatus}
error_payload = {
"status_code": 0,
"type": "ValidationError",
"message": "",
"errors": [],
}
# error = error_payload["error"]
status_code = response.status_code
error_payload["status_code"] = status_code
error_payload["message"] = http_code_to_message[status_code]
error_payload["errors"] = errors_list
# Overwrite default exception_handler response data
response.data = error_payload
return response
The main idea comes from here, but I changed it to my needs. change it as you see fit.
Don't forget to set it as your default exception handler in you settings.py file:
REST_FRAMEWORK["EXCEPTION_HANDLER"] = "utils.exception_handler.api_exception_handler";

Getting Model Invalid message with Azure Form Recognizer API

I am trying to use Microsoft Azure Form Recognizer API to upload Invoice pdf and get table info inside it.
I was able to make a successful POST request.
But not able to train the model and getting an error that 'No valid blobs found in the specified Azure blob container. Please conform to the document format/size/page/dimensions requirements.'.
But I have more than 5 files in a blob storage container.
I have also provided the shared key for the blob container. You can find the code I have written and the error attached.
"""
Created on Thu Feb 20 16:22:41 2020
#author: welcome
"""
########## Python Form Recognizer Labeled Async Train #############
import json
import time
from requests import get, post
# Endpoint URL
endpoint = r"https://sctesting.cognitiveservices.azure.com"
post_url = endpoint + r"/formrecognizer/v2.0-preview/custom/models"
print(post_url)
source = '<source url from blob storage>'
prefix = "name of the folder"
includeSubFolders = False
useLabelFile = False
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': '<key>',
}
body = {
"source": source,
"sourceFilter": {
"prefix": prefix,
"includeSubFolders": includeSubFolders
},
"useLabelFile": useLabelFile
}
try:
resp = post(url = post_url, json = body, headers = headers)
if resp.status_code != 201:
print("POST model failed (%s):\n%s" % (resp.status_code, json.dumps(resp.json())))
quit()
print("POST model succeeded:\n%s" % resp.headers)
get_url = resp.headers["location"]
except Exception as e:
print("POST model failed:\n%s" % str(e))
quit()
n_tries = 15
n_try = 0
wait_sec = 3
max_wait_sec = 60
while n_try < n_tries:
try:
resp = get(url = get_url, headers = headers)
resp_json = resp.json()
if resp.status_code != 200:
print("GET model failed (%s):\n%s" % (resp.status_code, json.dumps(resp_json)))
quit()
model_status = resp_json["modelInfo"]["status"]
if model_status == "ready":
print("Training succeeded:\n%s" % json.dumps(resp_json))
quit()
if model_status == "invalid":
print("Training failed. Model is invalid:\n%s" % json.dumps(resp_json))
quit()
# Training still running. Wait and retry.
time.sleep(wait_sec)
n_try += 1
wait_sec = min(2*wait_sec, max_wait_sec)
except Exception as e:
msg = "GET model failed:\n%s" % str(e)
print(msg)
quit()
print("Train operation did not complete within the allocated time.")
output got in Anaconda prompt by running the above code
POST model succeeded:
{'Content-Length': '0', 'Location': 'https://sctesting.cognitiveservices.azure.com/formrecognizer/v2.0-preview/custom/models/30b7d99b-fc57-466d-a59b-c0d9738c03ac', 'x-envoy-upstream-service-time': '379', 'apim-request-id': '18cbec13-8129-45de-8685-83554e8b35d4', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'x-content-type-options': 'nosniff', 'Date': 'Thu, 20 Feb 2020 19:35:47 GMT'}
Training failed. Model is invalid:
{"modelInfo": {"modelId": "30b7d99b-fc57-466d-a59b-c0d9738c03ac", "status": "invalid", "createdDateTime": "2020-02-20T19:35:48Z", "lastUpdatedDateTime": "2020-02-20T19:35:50Z"}, "trainResult": {"trainingDocuments": [], "errors": [{"code": "2014", "message": "No valid blobs found in the specified Azure blob container. Please conform to the document format/size/page/dimensions requirements."}]}}
if you use the from recognizer labeling tool to do the same thing, would that work? have you put the files in the root directory of the Azure blob, or in a sub directory?
Make sure that the files in your blob storage container fit the requirements here: https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/overview#custom-model
If your files look fine, also check what kind of SAS token you are using. The error message you have can occur if you are using a policy defined SAS token, in which case, try switching to a SAS token with explicit permissions as detailed here: https://stackoverflow.com/a/60235222/12822344
You didnt specify a source. You need to generate a Shared Access Signature (SAS) when you're in the menu of the selected storage account. If you have a container in that storage account, you'll need to include your container name in the URL. EX. If you have a container named "train":
"www....windows.net/?sv=....." ---> "www....windows.net/train?sv=......".
Otherwise you can try to use the "prefix" string, but I found it buggy.
Also, you have not included your subscription key.
https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/python-train-extract
Try removing the Source Filter from the body. It should work.

KeyError: 'Records' while trying to process data from a kinesis InputStream in AWS lambda

I am trying to send data to a stream and then use kinesis firehouse to deliver the data to ElasticSearch, I am using python lambda function to convert the data to JSON before pushing, but the lambda is failing with below error.
[ERROR] KeyError: 'Records'
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 7, in lambda_handler
for record in event["Records"]:
I am able to see the Records in the shard using sharditerator like below.
{
"Records": [
{
"SequenceNumber": "49599580114447666780699883212202628138922281244234350610",
"ApproximateArrivalTimestamp": 1568741427.415,
"Data": "MjAwNi8wMS8wMSAwMDowMDowMHwzMTA4IE9DQ0lERU5UQUwgRFJ8M3wzQyAgICAgICAgfDExMTV8MTA4NTEoQSlWQyBUQUtFIFZFSCBXL08gT1dORVJ8MjQwNHwzOC41NTA0MjA0N3wtMTIxLjM5MTQxNTh8MjAxOS8wOS8xNyAyMzowMDoyNA==",
"PartitionKey": "1"
},
I am using below lambda function to process the stream.
import json
print("Loading the function")
success = 0
failure = 0
def lambda_handler(event, context):
for record in event["Records"]:
print(record)
payload=base64.b64decode(record["Data"]).decode('utf-8')
match = payload.split('|')
result = {}
if match:
# create a dict object of the row
#build all fields from array
result["crime_time"] = match[0]
result["address"] = match[1]
result['district'] = int(match[2])
result['beat'] = match[3]
result['grid'] =int(match[4])
result['description'] = match[5]
result['crime_id'] = int(match[6])
result['latitude'] = float(match[7])
result['longitude'] = float(match[8])
result['load_time'] = match[9]
result['location'] = {
'lat' : float(match[7]),
'lon' : float(match[8])
}
success+=1
return {
'statusCode': 200,
'body': json.dumps(result)
}
But I am getting error in lambda function after sending data to the stream.
What is the error and did you check cloudwatch logs to see what is happening? I feel as this will give you a good indicator of what is going on.

Get Facebook Marketing API Ads insights results as CSV or JSON format

I am attempting to use the Facebook-Python-Ads-SDK to automate reporting on Ad Account performance. I have successfully requested a report at the ad set level, however the output of the report is a Cursor object, where I would prefer it to be a json or csv. I have tried the "export_format" option in params but it does not seem to make any difference. The output looks like JSON, so I attempted to import the object as a dataframe in pandas using pd.read_json(result) but it gives off an error saying that the object type "Cursor" needs to be str or bytes.
Does anyone have any experience with this api that can help me out? My code is below.
def report_request(start_date,end_date):
fields = [
'date_start',
'account_name',
'adset_name',
'ad_name',
'impressions',
'clicks',
'spend'
]
params = {
'time_range': {
'since': start_time,
'until': end_time,
},
'level':'ad',
'export_format':'csv'
}
account_id = [<ACCOUNT_ID>]
adAccount = AdAccount('act_' + account_id)
api_batch = get_api().new_batch()
request = adAccount.get_insights(fields=fields, params=params, async=False, batch=api_batch)
result = request.execute()
return result

Microsoft Emotion Video API Python 3.2

I am trying to analyze a video via Emotion API by Microsoft using Python 3.2
I am encountering the following error:
b'{ "error": { "code": "Unauthorized", "message": "Access denied due to invalid subscription key. Make sure you are subscribed to an API you are trying to call and provide the right key." } }'
I am using Emotion API subscription key (i have also used the Face API key, and computer vision key just in case).
Code:
import http.client, urllib.request, urllib.parse, urllib.error, base64
headers = {
# Request headers
'Ocp-Apim-Subscription-Key': '{subscription key}',
}
params = urllib.parse.urlencode({
})
try:
conn = http.client.HTTPSConnection('westus.api.cognitive.microsoft.com')
conn.request("GET", "/emotion/v1.0/operations/{oid}?%s" % params, "{body}", headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))
Your code works. Just make sure you wait 10 minutes after generating the API key so that it starts working (it says so in the Azure Portal).
Also, in general for Cognitive Services, make sure that the API key you have corresponds to the region you're trying to hit (West US, etc.)

Resources