How to PUT file to Tika-server in NodeJs - node.js

The Scenario
I am running a VueJs client, a NodeJs Restify API Server, and a Tika-server out of the official Docker Image. A user makes a POST call with formData containing a PDF file to be parsed. The API server receives the POST call and I save the PDF on the server. The API server should PUT the file to the unpack/all endpoint on the Tika-server and receive a zip containing a text file, a metadata file, and the set of images in the PDF. I would then process the zip and pass some data back to the client.
The Problem
I create a buffer containing the file to be parsed using let parsingData = fs.createReadStream(requestFilename); or let parsingData = fs.readFileSync(requestFilename);, set the axios data field to parsingData, then make my request. When I get the response from the Tika-server, it seems the Tika-server has treated the request as empty; within the zip, there are no images, the TEXT file is empty, the METADATA.
When I make the following request to the Tika-server via CURL curl -T pdf_w_images_and_text.pdf http://localhost:9998/unpack/all -H "X-Tika-PDFExtractInlineImages: true" -H "X-Tika-PDFExtractUniqueInlineImagesOnly: true"> tika-response.zip, I get a response zip file containing accurate text, metadata, stripped images.
The Code
let parsingData = fs.createReadStream('pdf_w_images_and_text.pdf');
axios({
method: 'PUT',
url: 'http://localhost:9998/unpack/all',
data: parsingData,
responseType: 'arraybuffer',
headers: {
'X-Tika-PDFExtractInlineImages': 'true',
'X-Tika-PDFExtractUniqueInlineImagesOnly': 'true'
},
})
.then((response) => {
console.log('Tika-server response recieved');
const outputFilename = __dirname+'\\output.zip';
console.log('Attempting to convert Tika-server response data to ' + outputFilename);
fs.writeFileSync(outputFilename, response.data);
if (fs.existsSync(outputFilename)) {
console.log('Tika-server response data saved at ' + outputFilename);
}
})
.catch(function (error) {
console.error(error);
});
The Question
How do I encode and attach my file to my PUT request in NodeJs such that the Tika-server treats it as it does when I make the request through CURL?

Axios is sending the request with a content type of application/x-www-form-urlencoded and therefore the file content isn't being detected and parsed.
You can change this by passing either the known content type of the file, or a content type of application/octet-stream to allow Apache Tika Server to auto-detect.
Below is a sample based on your question's code that illustrates this:
#!/usr/bin/env node
const fs = require('fs')
const axios = require('axios')
let parsingData = fs.createReadStream('test.pdf');
axios({
method: 'PUT',
url: 'http://localhost:9998/unpack/all',
data: parsingData,
responseType: 'arraybuffer',
headers: {
'X-Tika-PDFExtractInlineImages': 'true',
'X-Tika-PDFExtractUniqueInlineImagesOnly': 'true',
'Content-Type': 'application/octet-stream'
},
})
.then((response) => {
console.log('Tika-server response recieved');
const outputFilename = __dirname+'/output.zip';
console.log('Attempting to convert Tika-server response data to ' + outputFilename);
fs.writeFileSync(outputFilename, response.data);
if (fs.existsSync(outputFilename)) {
console.log('Tika-server response data saved at ' + outputFilename);
}
})
.catch(function (error) {
console.error(error);
});

Related

Re-uploading a file after passing them into AWS lambdas(nodejs) + API gateway

I have a REST API created using AWS Lambdas and API gateway.
This API accommodates file uploads and I have a requirement to parse the uploaded files and then send those files individual to another source as form data.
When I send two images via postman and log the event body I get the following string. (It has two files)
----------------------------269453880547064499146449
Content-Disposition: form-data; name="file"; filename="smiling.png"
Content-Type: image/png
�PNG
IHDR��asBIT|d�^IDAT8���=KBa��IB����"�-P\�6!Bz�D�c�hk�K�hAV`Km���K�PKm��Oý�R�-�O��<�#ZkzT���4B��nMȑ0����##�A�- ����7w��"�fY��J�C�)�Z`D3�a��E�h���<F�7w����d�ɉ7���Y�?f�+Y�&9�B����P����`%�d:�T
�m�h�K�`����zT;�e �mc�$=A�q���&#Y��4O=W����P#�T���*��V�t`a��H�UD��6��Һ���W[��, ��u�Ea�.c�-��S�z���Q�`���S�~y��xݡ�O�]IEND�B`�
----------------------------269453880547064499146449
Content-Disposition: form-data; name="file"; filename="smiling.png"
Content-Type: image/png
�PNG
IHDR��asBIT|d�^IDAT8���=KBa��IB����"�-P\�6!Bz�D�c�hk�K�hAV`Km���K�PKm��Oý�R�-�O��<�#ZkzT���4B��nMȑ0����##�A�- ����7w��"�fY��J�C�)�Z`D3�a��E�h���<F�7w����d�ɉ7���Y�?f�+Y�&9�B����P����`%�d:�T
�m�h�K�`����zT;�e �mc�$=A�q���&#Y��4O=W����P#�T���*��V�t`a��H�UD��6��Һ���W[��, ��u�Ea�.c�-��S�z���Q�`���S�~y��xݡ�O�]IEND�B`�
----------------------------269453880547064499146449--
I used lambda-multipart-parser to parse this body and extract the meta data.
import parser from 'lambda-multipart-parser'
....
const result = await parser.parse(event)
The result above gives the files with the following type.
{
filename: string
content: Buffer
contentType: string
encoding: string
fieldname: string
}
The issue I have is that I have to send these files individually to another server as multipart/formdata.
Approaches I've tried so far.
Approach 1
Used regex to split the event body by the boundary the web boundary in the example is ----------------------------269453880547064499146449.
const splitFileBody = event.body?.split(fileBoundary)
After splitting it I am left with a string.
// In this example I am only trying to send one file
formData.append('file', splitFileBody[0])
Outcome : The above approach gives me me a 400 HTTP status code along with the following error
body of your POST request is not well-formed multipart/form-data
Approach 2
Tried creating an object of type File in lambda using the parsed file as follows
const fileC = new File([file.content], file.fieldname, { type: file.contentType, lastModified: Number(new Date()) })
console.log('fileC', fileC)
const formData = new FormData()
formData.append('file', fileC)
Outcome : Lambda logs give File is not defined error.
Approach 3
Passed in the parsed information into form.append directly without new File
form.append('file', file.content, {
filename: file.filename,
contentType: file.contentType
});
Outcome: The above approach gives me me a 400 HTTP status code along with the following error
body of your POST request is not well-formed multipart/form-data
Are my approaches correct. If not what should I do differently. If the approaches above are fine, which of them is better and how to I avoid the errors?
Additional Info
I am using form-data package because I kept on getting FormData is not defined during the lambda runtime.
import FormData from 'form-data'
....
formData.append('file', file.content, {
filename: file.filename
})
I am using axios to make the post request
const res = await axios.post(URL, formData, {
headers: {
...formData.getHeaders(),
'content-length': file.content.length
}
})
The parsed file content can be directly appended to formData from form-data as follows
const result = await parser.parse(event)
const formData = new FormData();
for (const file of result.files) {
formData.append('file', file.content, {
filename: file.filename,
contentType: file.contentType
});
}
Additionally the content-length should have the the value formData.getLengthSync() as follows. The reason is that formData might contain other fields appended to it and file.content.length only contains the size of the file instead of including the other data appended.
await axios.post(URL, formData, {
headers: {
...formData.getHeaders(),
'Content-Length': formData.getLengthSync()
}
})

nextjs serverside post response return curropted zip file from external api but the api request works fine with postman

so I have an API with an endpoint that returns an xlsx file on post request
when I call that API from nextjs server side API it returns a corrupted zip file
but when I call it directly through postman it returns the expected xlsx file.
the call to the API from nextns looks like this:
axios.post(`${process.env.API_URI}`, formData, {
headers: {
...formData.getHeaders(),
},
responseType:"blob"
}).then(response => {
res.status(200).send(response.data)
tmpObj.removeCallback()
}).catch(err => {
console.log(err)
tmpObj.removeCallback()
})
is there a proper way to receive the xlsx file in nextjs API ... Nodejs
update:
eventually I had to set the Content-Type in axios header and the responseType in axios config object
axios.post(`${process.env.API_URI}`, formData, {
headers: {
...formData.getHeaders(),
'Content-Type': 'blob',
},
responseType:"arraybuffer"
}).then(response => {
//createthe buffer in the frontend const buffer = Buffer.from(response.data, 'base64');
res.setHeader('Content-Type', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
// res.setHeader('Content-Disposition', 'attachment;filename=SheetJSNode.xlsx')
res.status(200).send(response.data)
}).catch(err => {
console.log(err)
})

NodeJS server side - file Expected UploadFile, received: <class 'str'>

having issues uploading file from NodeJs server side, found 100 posts and reasearches but nothing works, would appreciate any help.
Structure of the App
Front App - React Admin framework receving file and i encode in base64 the content of the image to send to API
Backend - NestJS App - receving base64 image in API
From my backend API need to send file to an external backend (Python API) to upload - here is the problem
Please see below my code, something wrong with the file from JS
i have tried several methods and all of them ends in same error
1 solution
converting base64 image in buffer and send to external backend to upload the file
have tried to pass as well cleanImageBuffer but no changes
import axios from 'axios';
import FormData from 'form-data';
export async function upload(
fileBase64: string,
filename: string
): Promise<any> {
const buffer = Buffer.from(fileBase64, 'base64')
const extension = fileBase64.substring(fileBase64.indexOf('/') + 1, fileBase64.indexOf(";base64"))
const cleanBase64 = fileBase64.replace(/^data:image\/png;base64,/, '')
const cleanImageBuffer = Buffer.from(cleanBase64, 'base64')
const formData = new FormData();
// have tried to pass as well cleanImageBuffer but no changes
formData.append('file', buffer);
formData.append('fileName', filename + '.' + extension);
formData.append('namespace', 'test');
return await axios
.post('external_api_url', JSON.stringify(formData), {
headers: {
Authorization: `Bearer token`,
ContentType: 'multipart/form-data'
}
})
.then((response) => {
console.log('response = ' + JSON.stringify(response))
})
result 1 solution
{
"status": "error",
"error": {
"code": "bad_request",
"message": "file Expected UploadFile, received: <class 'str'>"
}
}
2 solution
from base64 image received saving on my disk
after creating a stream and sending the image
export async function upload (
fileBase64: string,
filename: string
): Promise<any> {
const extension = fileBase64.substring(fileBase64.indexOf('/') + 1, fileBase64.indexOf(";base64"))
const cleanBase64 = fileBase64.replace(/^data:image\/png;base64,/, '')
const TMP_UPLOAD_PATH = '/tmp'
if (!fs.existsSync(TMP_UPLOAD_PATH)) {
fs.mkdirSync(TMP_UPLOAD_PATH);
}
fs.writeFile(TMP_UPLOAD_PATH + '/' + filename + '.' + extension, cleanBase64, 'base64', function(err) {
console.log(err);
})
const fileStream = fs.createReadStream(TMP_UPLOAD_PATH + '/' + filename + '.' + extension)
const formData = new FormData();
formData.append('file', fileStream, filename + '.' + extension);
formData.append('fileName', filename + '.' + extension);
formData.append('namespace', 'test');
return await axios
.post('external_api_url', formData, {
headers: {
Authorization: `Bearer token`,
ContentType: 'multipart/form-data'
}
})
.then((response) => {
console.log('response = ' + JSON.stringify(response))
})
}
result 2 solution
{
"status": "error",
"error": {
"code": "bad_request",
"message": "file Expected UploadFile, received: <class 'str'>"
}
}
other solution that ended in same result
tried to use fetch from node-fetch - same result
found out that some people had an outdated version of axios and having this issues, i have installed latest axios version 1.1.3 but same result
best scenario that i need
from base64 image received
convert in buffer and send file to external Python API so to avoid saving the file on local disk
would appreciate any help
below is a python example that works but not JS (JS nothing works)
import requests
url = "http://127.0.0.1:8000/external_api"
payload={'namespace': 'test'}
files=[
('file',('lbl-pic.png',open('/local/path/lbl-pic.png','rb'),'image/png'))
]
headers = {
'Authorization': 'Bearer token'
}
response = requests.request("POST", url, headers=headers, data=payload, files=files)
print(response.text)
Just a suggestion:
Here's a line which returns mentioned error https://github.com/tiangolo/fastapi/blob/41735d2de9afbb2c01541d0f3052c718cb9f4f30/fastapi/datastructures.py#L20, you might find it useful.
First see if you can make it work with regular HTML file input (don't complicate with Base64 yet), as described here https://stackoverflow.com/a/70824288/2347084
If (1) works, then try converting base64 into a File object as suggested here https://stackoverflow.com/a/47497249/2347084
Combine (2) and (1)
I want to post my solution that worked, because as i can see in internet everybody have issues with FormData on nodejs
i was using axios to send the buffer for uploading file
issue is with axios and specially with FormData, it does not add Content-Lenght in headers, any version of axios does not do this
python API had required Content-Lenght
So kindly was asking to make this header optionally in python API and the code started to work
The solution is if anybody goes in similar issues
axios does not add Content-Lenght when working with FormData (could not find any version of axios that works)
if you work with a buffer without having the file on local disk than will be issue because of Content-Lenght
if u have the file locally than using the module fs u are able to read the file and add all hedears and Content-Lenght
on axios GitHub issue is saying that this bug is fixed in latest axios, but it was still not working in my case
below is a code by using buffer and Content-Lenght is not required in 3rd API
function upload (image: {imageBase64: string, fileName: string}) {
const { imageBase64, fileName } = image;
const cleanBase64 = imageBase64.substr(imageBase64.indexOf(',') + 1);
// buffer should be clean base64
const buffer = Buffer.from(cleanBase64, 'base64');
const formData = new FormData();
// filename as option is required, otherwise will not work, will say that received file is string and UploadFile
formData.append('file', buffer, { filename: fileName });
return client
.post('url', formData, {
headers: {
...formData.getHeaders(),
},
})
.then((response) => response.data)
.catch((error) => {
return {
status: 'error',
error,
};
});
}

How can I send post request with base64 image?

I am making an image upload component in vue js with custom cropping option. The cropped version is being saved in my state as a base64 string. This is it:
....
now I am trying to send this image to my node js server using post request API. In Postman, I am writing the body selecting "raw" and "json" in this the body in this way:
{
"image" : ".....
}
The request not detecting this json data in the body and returning error:
{
"image": "\"image\" is required"
}
Also tried the form_data sending method in this way:
var axios = require('axios');
var FormData = require('form-data');
// var fs = require('fs');
var data = new FormData();
data.append('image', formdata.logoFinalImage);
var config = {
method: 'post',
url: myurl,
headers: {
'Authorization': this.state.token,
'Content-Type': 'application/json'
},
data: data
};
axios(config)
.then(function (response) {
console.log(JSON.stringify(response.data));
})
.catch(function (error) {
console.log(error);
});
Same issue.
How can I send the final cropped version to the node api endpoint?
Solved the issue. There were two ways of doing it. One is required changes in the backend to configure the code in a way that can receive base64 and convert it to image. Reference: https://medium.com/js-dojo/how-to-upload-base64-images-in-vue-nodejs-4e89635daebc
Other is to make the base64 image file, and then send it to the backend as form-data. Used this one for my case. Reference of this solution: https://gist.github.com/ibreathebsb/a104a9297d5df4c8ae944a4ed149bcf1
if its working in postman then you can create the code from postman itself , select code and search for axios
v8<
if using v8

Get image from Axios and send as Form Data to Wordpress API in a Cloud Function

What I'm trying to accomplish is using a Firebase Cloud Function (Node.js) to:
First download an image from an url (f.eg. from unsplash.com) using an axios.get() request
Secondly take that image and upload it to a Wordpress site using the Wordpress Rest API
The problem seems (to me) to be that the formData doesnt actually append any data, but the axios.get() request actually does indeed retrieve a buffered image it seems. Maybe its something wrong I'm doing with the Node.js library form-data or maybe I get the image in the wrong encoding? This is my best (but unsuccessfull) attempt:
async function uploadMediaToWordpress() {
var FormData = require("form-data");
var formData = new FormData();
var response = await axios.get(
"https://images.unsplash.com/photo-1610303785445-41db41838e3e?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=634&q=80"
{ responseType: "arraybuffer" }
);
formData.append("file", response.data);
try {
var uploadedMedia = await axios.post("https://wordpresssite.com/wp-json/wp/v2/media",
formData, {
headers: {
"Content-Disposition": 'form-data; filename="example.jpeg"',
"Content-Type": "image/jpeg",
Authorization: "Bearer <jwt_token>",
},
});
} catch (error) {
console.log(error);
throw new functions.https.HttpsError("failed-precondition", "WP media upload failed");
}
return uploadedMedia.data;
}
I have previously successfully uploaded an image to Wordpress with Javascript in a browser like this:
async function uploadMediaToWordpress() {
let formData = new FormData();
const response = await fetch("https://images.unsplash.com/photo-1610303785445-41db41838e3e?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=634&q=80");
const blob = await response.blob();
const file = new File([blob], "image.jpeg", { type: blob.type });
formData.append("file", file);
var uploadedMedia = await axios.post("https://wordpresssite.com/wp-json/wp/v2/media",
formData, {
headers: {
"Content-Disposition": 'form-data; filename="example.jpeg"',
"Content-Type": "image/jpeg",
Authorization: "Bearer <jwt_token>",
},
});
return uploadedMedia.data;
},
I have tried the last couple of days to get this to work but cannot for the life of me seem to get it right. Any pointer in the right direction would be greatly appreciated!
The "regular" JavaScript code (used in a browser) works because the image is sent as a file (see the new File in your code), but your Node.js code is not really doing that, e.g. the Content-Type value is wrong which should be multipart/form-data; boundary=----...... Nonetheless, instead of trying (hard) with the arraybuffer response, I suggest you to use stream just as in the axios documentation and form-data documentation.
So in your case, you'd want to:
Set stream as the responseType:
axios.get(
'https://images.unsplash.com/photo-1610303785445-41db41838e3e?ixid=MXwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHw%3D&ixlib=rb-1.2.1&auto=format&fit=crop&w=634&q=80',
{ responseType: 'stream' }
)
Use formData.getHeaders() in the headers of your file upload request (to the /wp/v2/media endpoint):
axios.post( 'https://wordpresssite.com/wp-json/wp/v2/media', formData, {
headers: {
...formData.getHeaders(),
Authorization: 'Bearer ...'
},
} )
And because the remote image from Unsplash.com does not use a static name (e.g. image-name.jpg), then you'll need to set the name when you call formData.append():
formData.append( 'file', response.data, 'your-custom-image-name.jpeg' );
I hope that helps, which worked fine for me (using the node command for Node.js version 14.15.4, the latest release as of writing).

Resources