express.js and request.js - incomplete PDF transfer when using callback syntax

express.js and request.js - incomplete PDF transfer when using callback syntax - node.js

Simplified question
why, when using express.js & request.js following two examples:
request.get(url)
.on('response' (requestjsResponse) => {
requestjsResponse.pipe(res);
})
and
request.get(url, (err, requestjsResponse, requestjsBody) => {
res.send(requestjsResponse)
})
Tends not to produce same results, even when requestjsBody contain expected content?
Detailed question
I have two express.js versions of route handler that are handling some file proxying procedures for multiple file types. The code is using standard express.js req/res/next notation. Basically, what might be important from the background, non-code information for this issue is that two most mainly returned types are handled as follows:
PDF: shall be opened within browser, their size is usually no less
than 18K (accordinng to content-length header)
EML: Shall be
downloaded, therir size is usually smaller than 16K (accordinng to
content-length header)
Both handlers versions are using request.js, one with
get(url: string, callback: (Error, Response, Body) => void)
form, that I'll be referring as callback form, where entire body is expected inside such callback.In this case, the response to user is send by plain express.js res.send(Body). other one is using form
get(url: string).on(event: 'response', callback: listener: (request.Response) => void)
that I'll be referring as event/pipe form, and is transferring response to end user by piping it by request.Response.pipe(res) inside 'response' handler. Details provided in code listing.
I'm unable to find the difference between those two forms, but:
In case of .eml (MIME message/rfc822, you can threat them as fancy HTML) files both versions works exactly same way, file is nicely downloaded.
In case of .pdf, when using event/pipe form get(url).on('response', callback) I'm able to successfully transfer PDF document to client. When I'm using callback form (i.e. get(url: string, callback: (Error, Response, Body) => void)), even when I'm peeking body in debugger (seems to be complete PDF, contains PDF header, EOF marker, e.c.t.), client receives only some strange preamble declaring HTML:
<!doctype html><html><body style='height: 100%; width: 100%; overflow: hidden; margin:0px; background-color: rgb(82, 86, 89);'><embed style='position:absolute; left: 0; top: 0;'width='100%' height='100%' src='about:blank' type='application/pdf' internalid='FD93AFE96F19F67BE0799686C52D978F'></embed></body></html>
but no PDF document is received afterwards. Chrome claims, that he was unable to load the document.
Please see code:
Non-working callback version:
request.get(url, (err, documentResponse, documentBody) => {
if (err) {
logger.error('Document Fetch error:');
logger.error(err);
} else {
const documentResponseContentLength = Number.parseInt(documentResponse.headers['content-length'], 10);
if (documentResponseContentLength === 0 || Number.isNaN(documentResponseContentLength)) {
logger.warn('No content provided for requested document or length header malformed');
res.redirect(get404Navigation());
}
if (mimetype === 'application/pdf') {
logger.info(' overwriting Headers (PDF)');
res.set('content-type', 'application/pdf');
// eslint-disable-next-line max-len, prefer-template
res.set('content-disposition', 'inline; filename="someName.pdf"');
logger.info('Document Download Headers (overridden):', res.headers);
}
if (mimetype === 'message/rfc822') {
logger.info(' overwriting Headers (message/rfc822)');
res.set('content-type', 'message/rfc822');
// eslint-disable-next-line max-len, prefer-template
res.set('content-disposition', 'attachment; filename="someName.eml"');
logger.info('Document Download Headers (overridden):', res.headers);
}
res.send(documentBody) /* Sending message to clinet */
}
})
.on('data', (d) => {
console.log('We are debugging here')
})
Working event based/piped version:
const r = request
.get(url)
.on('response', (documentsResponse) => {
if (Number.parseInt(documentsResponse.headers['content-length'], 10) !== 0) {
// Überschreibe headers für PDF und TIFF, diese kommen gelegentlich unvollständig an
if (mimetype === 'application/pdf') {
logger.info(' overwriting Headers (PDF)');
res.set('content-type', 'application/pdf');
res.set('content-disposition', 'inline; filename="someName".pdf"')
logger.info('Document Download Headers (overridden):', documentsResponse.headers);
}
if (mimetype === 'message/rfc822') {
logger.info(' overwriting Headers (message/rfc822)');
res.set('content-type', 'message/rfc822');
res.set('content-disposition', 'attachment; filename="someName".eml"');
logger.info('Document Download Headers (overridden):', res.headers);
}
r.pipe(res); /* Response is piped to client */
} else {
res.redirect(get404Navigation());
}
}
.on('data', (d) => {
console.log('We are debugging here')
})
Event that part with r.pipe(res) seems extra suspicious (see where r is declared and where is used) this is the versions that works correctly for both cases.
I assume, that issue might be caused by nature of sending multipart content, so I added additional on('data', (d)=>{}) callbacks and set breakepoints to see, when response is ended/piped vs when data handler is called, and results are according to my expectations:
request(url, (err, response, body)) case, data handler is called twice, before execution of callback, entire body is accessible inside handler, so It's even more obscure to me that I'm unable just to res.send it.
request.get(url).on('response') piping to res is called firstly, then two times data handler is called. I believe internal guts of node.js HTTP engine are doing the asynchronous trick and are pushing responses one after another at each response chunk is received.
I'll be glad for any explanation, what I'm doing wrong and what can I align to make my callback version work as expected for PDF case.
Epilogue:
Why such code is used? Our backend is retrieving PDF data from external, non-exposed to public internet server, but due to legacy reasons some headers are set incorrectly (mainly Content-Disposition), so we are intercepting them and act as kind of alignment proxy between data source and client.

Related

busboy wait for field data before accepting file upload

Is something like this even possible, or are there better ways to do this? Is what Im doing even a good idea, or is this a bad approach?
What I want to do is upload a file to my nodejs server. Along with the file I want to send some meta data. The meta data will determine if the file can be saved and the upload accepted, or if it should be rejected and sending a 403 response.
I am using busboy and I am sending FormData from my client side.
The example below is very much simplified:
Here is a snippet of the client side code.
I am appending the file as well as the meta data to the form
const formData = new FormData();
formData.append('name', JSON.stringify({name: "John Doe"}));
formData.append('file', this.selectedFile, this.selectedFile.name);
Here is the nodejs side:
exports.Upload = async (req, res) => {
try {
var acceptUpload = false;
const bb = busboy({ headers: req.headers });
bb.on('field', (fieldname, val) => {
//Verify data here before accepting file upload
var data = JSON.parse(val);
if (val.name === 'John Doe') {
acceptUpload = true;
} else {
acceptUpload = false;
}
});
bb.on('file', (fieldname, file, filename, encoding, mimetype) => {
if (acceptUpload) {
const saveTo = '/upload/file.txt'
file.pipe(fs.createWriteStream(saveTo));
}else{
response = {
message: 'Not Authorized'
}
res.status(403).json(response);
}
});
bb.on('finish', () => {
response = {
message: 'Upload Successful'
}
res.status(200).json(response);
});
req.pipe(bb);
} catch (error) {
console.log(error)
response = {
message: error.message
}
res.status(500).json(response);
}
}
So basically, is it even possible for the 'field' event-handler to wait for the 'file' event handler? How could one verify some meta data before accepting a file upload?
How can I do validation of all data in the form data object, before accepting the file upload? Is this even possible, or are there other ways of uploading files with this kind of behaviour? I am considering even adding data to the request header, but this does not seem like the ideal solution.
Update
As I suspected, nothing is waiting. Which ever way I try, the upload first has to be completed, only then after is it rejected with a 403
Another Update
Ive tried the same thing with multer and have similar results. Even when I can do the validation, the file is completely uploaded from the client side. Once the upload is complete, only then the request is rejected. The file, however, never gets stored, even though it is uploaded in its entirety.

With busboy, nothing is written to the server if you do not execute the statement file.pipe(fs.createWriteStream(saveTo));
You can prevent more data from even being uploaded to the server by executing the statement req.destroy() in the .on("field", ...) or the .on("file", ...) event handler, even after you have already evaluated some of the fields. Note however, that req.destroy() destroys not only the current HTTP request but the entire TCP connection, which might otherwise have been reused for subsequent HTTP requests. (This applies to HTTP/1.1, in HTTP/2 the relationship between connections and requests is different.)
At any rate, it has no effect on the current HTTP request if everything has already been uploaded. Therefore, whether this saves any network traffic depends on the size of the file. And if the decision whether to req.destroy() involves an asynchronous operation, such as a database lookup, then it may also come too late.
Compare
> curl -v -F name=XXX -F file=#<small file> http://your.server
* We are completely uploaded and fine
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server
with
> curl -v -F name=XXX -F file=#<large file> http://your.server
> Expect: 100-continue
< HTTP/1.1 100 Continue
* Send failure: Connection was reset
* Closing connection 0
curl: (55) Send failure: Connection was reset
Note that the client sets the Expect header before uploading a large file. You can use that fact in connection with a special request header name in order to block the upload completely:
http.createServer(app)
.on("checkContinue", function(req, res) {
if (req.headers["name"] === "John Doe") {
res.writeContinue(); // sends HTTP/1.1 100 Continue
app(req, res);
} else {
res.statusCode = 403;
res.end("Not authorized");
}
})
.listen(...);
But for small files, which are uploaded without the Expect request header, you still need to check the name header in the app itself.

Get mp3 file from server by fetch

I have a server written in node.js. The client sends a get request by fetch with the url of the mp3 file that is in the files on the server. My goal is to send the mp3 file back to the client so that it can be played. I wrote something like this:
if (req.url.indexOf(".mp3") != -1) {
fs.readFile(__dirname + decodeURI(req.url), function (error, data) {
res.setHeader("Access-Control-Allow-Origin", "*");
res.writeHead(200, {
"Content-type": "audio/mpeg",
});
res.write(data);
res.end();
})
}
but I get this error: Uncaught (in promise) SyntaxError: Unexpected token I in JSON at position 0
Also, here it is client side:
fetch("http://localhost:3000/static/mp3/" + value, { method: "get" })
.then((response) => response.json())
.then((data) => (this.song = data));
document.getElementById("audio_src").src =
"http://localhost:3000/" + this.song;

In the client, you're calling response.json(), but the response you're getting back is NOT json. The data you're getting back is binary. Perhaps you should be calling response.blob()?
But, then you're trying to put the binary data into a URL as text. And, you're not handling the asynchronous nature of fetch() properly either. No, this is not the way to do things. You could create a data encoded URL, but there's really no point in doing it that way since whatever audio element you're using the the HTML page can fetch the MP3 from the URL by itself.
I might suggest something simpler in the client:
document.getElementById("audio_src").src = "http://localhost:3000/static/mp3/" + value;
And, let the browser's html tag go get the MP3 for you. I'm assuming that the element represented by audio_src is something that knows how to play MP3 audio sources on it's own. If so, that means you just give it the URL and it will go fetch it and play it on its own.

AWS Cloudfront + lambda#edge modify html content (making all links absolute -> relative)

I (maybe falsely) assumed lambda#edge can modify origin.responce content,
so wrote a lambda function like this:
/* this does not work. response.Body is not defined */
'use strict';
exports.handler = (event, context, callback) => {
var response = event.Records[0].cf.response;
var data = response.Body.replace(/OLDTEXT/g, 'NEWTEXT');
response.Body = data;
callback(null, response);
};
Which fails because you can not reference origin responce body with this syntax.
Can I modify this script to make it work as I intended, or maybe should I consider using another service on AWS?
My background :
We are trying to set up an AWS Cloudfront distribution, that consolidates access to several websites, like this:
ttp://foo.com/ -> https:/newsite.com/foo/
ttp://bar.com/ -> https:/newsite.com/bar/
ttp://boo.com/ -> https:/newsite.com/boo/
the sites are currently managed by external parties. We want to disable direct public access to foo/bar/boo, and have just newsite.com as the only site visible on the internet.
Mapping the origins into a single c-f distribution is relatively simple.
however doing so will break html contents that specify files with an absolute url,
if their current domain names are removed from the web.
ttp://foo.com/images/1.jpg
-> (disable foo.com dns)
-> image not found
to benefit from cloudfront caching and other merits,
I want to modify/rewrite all absolute file references in html files to a relative url -
so
<img src="ttp://foo.com/images/1.jpg">
becomes
<img src="/foo/images/1.jpg">
//(accessed as https:/newsite.com/foo/images/1.jpg from a user)
//(maybe I should make it an absolte url for SEO purpose)
(http is changed to ttp, due to restriction of using the banned domain name foo.com)
(edit)
I found this AWS blog, which may be a great hint but feel a little too convoluted to my expectation. (set up a linux container so I can just use sed to process html files, maybe using S3 as a temp storage)
Hope I can find a simpler way:
https://aws.amazon.com/blogs/networking-and-content-delivery/resizing-images-with-amazon-cloudfront-lambdaedge-aws-cdn-blog/

From what I have just learnt myself you unfortunately cannot modify the response body within a Lambda#edge. You can only wipe out or totally replace the body content. I was hoping to be able to clean all responses from a legacy site, but using a Cloudfront Lambda#Edge will not allow this to be done.
As the AWS documentation states here :
When you’re working with the HTTP response, Lambda#Edge does not expose the body that is returned by the origin server to the origin-response trigger. You can generate a static content body by setting it to the desired value, or remove the body inside the function by setting the value to be empty. If you don’t update the body field in your function, the original body returned by the origin server is returned back to viewer.

I ran into the same issue, and have been able to pull some info out of the request headers to piece together a URL from which I can fetch the original body.
Beware: I haven't yet been able to confirm that this is a "safe" method, like maybe it's relying on undocumented behaviour etc, but for now it DOES fetch the original body properly, for me. Of course it also takes another request / round trip, possibly inferring some extra transfer costs, execution time, etc.
const fetchOriginalBody = (request) => {
const host = request['headers']['host'][0]['value']; // xxxx.yyy.com
const uri = request['uri'];
const fetchOriginalBodyUrl = 'https://' + host + uri;
return httpsRequest(fetchOriginalBodyUrl);
}
// Helper that turns https.request into a promise
function httpsRequest(options) {
return new Promise((resolve, reject) => {
const req = https.request(options, (res) => {
if (res.statusCode < 200 || res.statusCode >= 300) {
return reject(new Error('statusCode=' + res.statusCode));
}
var body = [];
res.on('data', function(chunk) {
body.push(chunk);
});
res.on('end', function() {
try {
body = Buffer.concat(body).toString();
// body = JSON.parse(Buffer.concat(body).toString());
} catch(e) {
reject(e);
}
resolve(body);
});
});
req.on('error', (e) => {
reject(e.message);
});
req.end();
});
}
exports.handler = async (event, context, callback) => {
const records = event.Records;
if (records && records.length > 0) {
const request = records[0].cf.request;
const body = await fetchOriginalBody(request);
}
...

Send an image as the body of a request, image recived with a request from outside

Yeah i kinda didn't know how to type the title well...
I've a node server which recives an image via post form. I then want to send this image to Microsoft vision and the same Google service in order to gether information from both, do some stuff, and return a result to the user that has accessed my server.
My problem is: how do i send the actual data?
This is the actual code that cares of that:
const microsofComputerVision = require("microsoft-computer-vision");
module.exports = function(req, res)
{
var file;
if(req.files)
{
file = req.files.file;
// Everything went fine
microsofComputerVision.analyzeImage(
{
"Ocp-Apim-Subscription-Key": vision_key,
"content-type": "multipart/form-data",
"body": file.data.toString(),
"visual-features":"Tags, Faces",
"request-origin":"westcentralus"
}).then((result) =>
{
console.log("A");
res.write(result);
res.end();
}).catch((err)=>
{
console.log(err);
res.writeHead(400, {'Content-Type': 'application/json'});
res.write(JSON.stringify({error: "The request must contain an image"}));
res.end();
});
}
else
{
res.writeHead(400, {'Content-Type': 'application/octet-stream'});
res.write(JSON.stringify({error: "The request must contain an image"}));
res.end();
}
}
If instead of calling "analyzeImage" i do the following
res.set('Content-Type', 'image/jpg')
res.send(file.data);
res.end();
The browser renders the image correctly, which made me think "file.data" contains the actual file (considered it's of type buffer).
But apparently Microsoft does not agree with that, because when i send the request to computer vision i get the following response:
"InvalidImageFormat"
The only examples i found are here, and the "data" that is used in that example comes from a file system read, not stright from a request. But saving the file to load it and then delete it to me looks like an horrible workaround, so i'd rather like to know in what form and how should i work on the "file" that i have to send it correctly for the APIs call.
Edit: if i use file.data (which i thought was the most correct since it would be sending the raw image as the body) i get an error which says that i must use a string or a buffer as content. So apparently that file.data is not a buffer in the way "body" requires O.o i'm not understanding honestly.
Solved, the error was quite stupid. In the "then" part, res.write(result) did not accept result as argument. This happened when i actually used the corret request (file.data which is a buffer). The other errors occurred everytime i tryed using toString() on file.data, in that case the request wasn't accepted.

Solved, the request asked for a buffer, and file.data is indeed a buffer. After chacking file.data type in any possible way i started looking for other problems. The error was much easier and, forgive my being stupid, too stupid to be evident. The result was a json, and res.write didn't accept a json as argument.

This is how I did it with Amazon Recognition Image Classifier, I know its not the same service your using - hoping this helps a little thou:
const imagePath = `./bat.jpg`;
const bitmap = fs.readFileSync(imagePath);
const params = {
Image: { Bytes: bitmap },
MaxLabels: 10,
MinConfidence: 50.0
};
route.post('/', upload.single('image'), (req, res) => {
let params = getImage();
rekognition.detectLabels(params, function(err, data) {
if (err) {
console.log('error');
}else {
console.log(data);
res.json(data);
}
});
});

Is there an effective way to validate an image source url in node?

I am using Node to get Instagram images and have come across an edge case I am interested in solving. While using the oembed API call, I can then get thumbnail_url.
In my service, I return the image url and move on. The issue here is, for certain permalinks (carousel/albums), this thumbnail_url loads with a 5xx Instagram Error.
What I would like to do is to verify the image url loads an image, and if not, do something else instead of returning it.
I know what the "something else" is. My problem is that the url I get back from the oembed call is indeed a valid url, so I don't need to validate that. I need to validate the url loads as expected.
My initial thought process was to do something like but I have never tried to verify an image source url before, I've always just tried to get the url in the first place:
function urlTester(url) {
request(url, (error, response, body) => {
if (error) {
console.log('URL does not load');
} else {
console.log('URL loads image!');
}
})
}

If the urls are always valid, and you only need to check for the actual response, you are correct.
(Vanilla) Request Example :
request(url,function(error, response, body){
if(!error && response.statusCode == 200){
/* RESPONSE SUCCESS */
}else{
/* RESPONSE ERROR */
}
})
If you want to go on further and verify that the response indeed contains an image you can use : response.headers['content-type']
if(((response.headers['content-type']).match(/(image)+\//g)).length != 0){
/* It contains 'image/' as the content type */
}else{
/* no match with 'image/' */
}
You can also try using request-image-size
Detects image dimensions via request instead of Node.js native
http/https, allowing for options and following redirects by default.
It reduces network traffic by aborting requests as soon as image-size
is able to obtain the image size.
It will return an error if the response is not a valid image :
Since version 2.0.0 it returns an ES6 native Promise that resolves
with the size object or rejects with an Error. Requires Node.js v4+.

As #EMX pointed out, I was headed in the right direction. Here is what I ended up doing:
function testUrl(url) {
return new Promise((resolve, reject) => {
request(url, function(error, response, body) {
if (error) {
reject(error);
}
resolve(response.statusCode);
})
})
}
If the statusCode !== 200, then I can move on as initially required.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

express.js and request.js - incomplete PDF transfer when using callback syntax - node.js

Related

busboy wait for field data before accepting file upload

Get mp3 file from server by fetch

AWS Cloudfront + lambda#edge modify html content (making all links absolute -> relative)

Send an image as the body of a request, image recived with a request from outside

Is there an effective way to validate an image source url in node?

Categories

Resources