Document AI unsupported input file format

Document AI unsupported input file format - node.js

Since the last update with Document AI nodeJS API, I'm not able to send in jpeg file formats any more. I received the following message first:
Error: 3 INVALID_ARGUMENT: At this time, the only MIME types supported are 'application/pdf','application/json', 'image/gif' and 'image/tiff'.
When I changed my code to handle TIFF images I get the following message:
"(node:15782) UnhandledPromiseRejectionWarning: Error: 3 INVALID_ARGUMENT: Unsupported input file format."
I'm sure the file is a TIFF, I store it in cloud storage first and the content type is described as "image/tiff"
I attached some images for clarification.

The Document AI API has been updated in the time since this post was originally made. I recommend using the v1 REST API and Node.JS Client Libraries
The Supported Files page in the documentation also lists the supported File Types with the appropriate MimeTypes.

I have ever had a similar problem to you with PDF file.
I uploaded pdf files to google cloud storage and was going to run document AI NodeJS API with the files, but I got the same error as you.
"(node:15782) UnhandledPromiseRejectionWarning: Error: 3 INVALID_ARGUMENT: Unsupported input file format."
In my code, the mimeType was set into 'application/pdf'.
But the problem was fixed after the mimeType was set into 'PDF'.
I wonder if this helps you even a little bit.

Related

Why does Firebase Service Account credential JSON not have the form of a JSON?

I have not attempted to use the Firebase Admin SDK for some time, so I apologize if this ends up being trivial but I have spent two days on this.
I am creating a new web and mobile app using Firebase, for which I have a data model layer shared between the web and mobile client apps. I want to set up automated testing of the data models using the Firebase Node.js Admin SDK.
So, I followed the instructions here https://firebase.google.com/docs/admin/setup
However, the service account credentials I download from firebase, although it is indeed downloaded as a .json file, the file does not have the form of a JSON file. It is just a long alphanumeric string with an '=' at the end.
As expected, exporting the environment variable ($ export GOOGLE_APPLICATION_CREDENTIALS=...) and then calling useApplicationDefault() results in an unexpected token error.
If I attempt to reconstruct the data type which I think is expected and pull the string in the file into a properly formatted JSON with the key "privateKey", then I get this error:
FirebaseAppError: Invalid contents in the credentials file
If I attempt to use the code snippet provided by Firebase on the Service Account page of my project, with the raw unedited non-JSON .json file, I still get unexpected token, as expected, but if I use the edited .json file with the correctly-formatted JSON, I get a PEM error.
FirebaseAppError: Failed to parse private key: Error: Invalid PEM formatted message.
As stated, the .json file Firebase provides to me is not a JSON and only contains an alphanumeric string terminated by an '=' sign.
My edited version has the form
{
"projectId": "myprojectid-id123",
"clientEmail": "email#domain.com",
"privateKey": "abcde1234567890="
}

AWS lambda function proxies requests of fetching binary blob(PDF) from service layer and then returns to the client

I've created a lambda function so that I can use it for validation purposes and then proxy the request to the service layer. Then the service layer response contains a binary blob(PDF), which goes through the lambda function then the API gateway finally would reach the client.
The first problem we ran into was the PDF got transformed or corrupted, just returned blank PDF. And then I found this post which did not make any sense to me at first. Until I saw this aws doc. It turns out it's required to encode the binary data into base64 and then put the indictor 'isBase64Encoded' to true. The gateway eventually converts the response back to the binary blob.
TBH, I am new to aws and I don't really understand why this is the way..what's wrong of passing through the original binary blob, why those conversion steps are necessary?
Here are list of things i had to do
Configured / as a Binary Media Type on gateway. (I tried to use application/pdf, but did not work?)
Make sure the response body from the service layer not transformed into string (I am using request, and by default it gives me string). I send encoding: null along with the request
When i get the Buffer data from the service layer, i use Buffer to convert response body into base64 encoding.
In the lambda output, I set isBase64Encoded to true
Finally, get the unaltered PDF...
I am wondering if someone can confirm i am doing in an expected way? Or maybe if there is a better way?
Also, when we set binary support media type to /, doesn't this mean it accepts all media types? But i only want the PDF to be supported.

This doc (https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-payload-encodings.html) should be able to answer your question. And there are two things you need to note:
You can pass the original binary file (blob) as well as a base64-encoded binary file through API Gateway.
Ref: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-content-encodings-examples-image-lambda.html
*/* (or /) works in your case, but it means the API Gateway will treat all payload as binary data and this breaks payload with text data, for example JSON payload. So, ideally application/pdf should be used as the "Binary Media Type".

Amazon S3 403 Forbidden Error for KML files but not JPG files

I am able to successfully upload (put object) jpg files to S3 with a particular code path, but receive a 403 forbidden error when using the same code path to upload a KML file. I am not restricting file types explicitly with "bucket policy," but feel that this must somehow be tied to bucket policy or CORS configuration.

I was using code based off the Heroku tutorial for uploading images to Amazon S3. The issue ended up being that the '+' symbol in the appropriate mime type is "application/vnd.google-earth.kml+xml" and the + symbol was being replaced with a space when fetching the file-type query parameter for our own S3 endpoint to generate signed requests. We were able to quickly fix this by just forcing the ContentType to be "application/vnd.google-earth.kml+xml" for all kml files going to our endpoint for generating signed S3 requests.

FIWARE object storage GE: Different type of responses obtained when downloading an image object

I can effectively make use of all operations available for Object Storage on my FIWARE account.
Nonetheless I have identified a strange behaviour when downloading objects from a container.
Please find below the procedure to reproduce that strange behaviour:
I upload two objects ("gonzo.png" and "elmo.png") to the container "photos"
1.1. First, by means of cloud UI (https://cloud.lab.fiware.org/#objectstorage/containers/) I manually upload the object "gonzo.png"
1.2. Later, by following the instructions from Object Storage GE programmer's guide I programmatically (or with the help of standalone Rest Client) upload the object "elmo.png"
I download the objects from the container "photos"
2.1 First, by following the instructions from Object Storage GE programmer's guide I successfully download object "gonzo.png". The webservice response body is the binary content of such object.
2.2. Later, by following same instructions as in step 2.1 I try to download the object "elmo.png". Now the webservice response body is a json with metadata and the binary content of the object.
What can I do receive a standard response body for both objects? Either binary or either json.
Why do I get a different response if the object is originally uploaded via Cloud UI or via external tool (program or rest client) ?
As in Download blob from fiware object-storage I have already tried to set the header response_type: text and the behaviour is the same.

There are many object stores out there, having different APIs.
The Object Storage GE was initially based on the CDMI API [1].
Currently, it is based on Openstack Swift [2].
The Cloud Portal still uses some of the CDMI features and specifically it may do 64-bit encoding of some types of objects in which case the object content is a json which contains the metadata and a base64 encoding of the data. I suspect that this is what happened to the object you have created using the cloud UI.
Thus, please use Swift native API for all operations.
The API is well documented here: http://developer.openstack.org/api-ref-objectstorage-v1.html
and the python examples in the programmer guide (https://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/Object_Storage_-_User_and_Programmers_Guide) also use the Native API.
[1] google for SNIA CDMI. Having less then 10 replutation I cannot put too many links
[2] google for Openstack Swift

Setting Metadata in Google Cloud Storage (Export from BigQuery)

I am trying to update the metadata (programatically, from Python) of several CSV/JSON files that are exported from BigQuery. The application that exports the data is the same with the one modifying the files (thus using the same server certificate). The export goes all well, that is until I try to use the objects.patch() method to set the metadata I want. The problem is that I keep getting the following error:
apiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/storage/v1/b/<bucket>/<file>?alt=json returned "Forbidden">
Obviously, this has something to do with bucket or file permissions, but I can't manage to get around it. How come if the same certificate is being used in writing files and updating file metadata, i'm unable to update it? The bucket is created with the same certificate.

If that's the exact URL you're using, it's a URL problem: you're missing the /o/ between the bucket name and the object name.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Document AI unsupported input file format - node.js

The Document AI API has been updated in the time since this post was originally made. I recommend using the v1 REST API and Node.JS Client Libraries The Supported Files page in the documentation also lists the supported File Types with the appropriate MimeTypes.

Related

Why does Firebase Service Account credential JSON not have the form of a JSON?

AWS lambda function proxies requests of fetching binary blob(PDF) from service layer and then returns to the client

Amazon S3 403 Forbidden Error for KML files but not JPG files

FIWARE object storage GE: Different type of responses obtained when downloading an image object

Setting Metadata in Google Cloud Storage (Export from BigQuery)

Categories

Resources