busboy wait for field data before accepting file upload

busboy wait for field data before accepting file upload - node.js

Is something like this even possible, or are there better ways to do this? Is what Im doing even a good idea, or is this a bad approach?
What I want to do is upload a file to my nodejs server. Along with the file I want to send some meta data. The meta data will determine if the file can be saved and the upload accepted, or if it should be rejected and sending a 403 response.
I am using busboy and I am sending FormData from my client side.
The example below is very much simplified:
Here is a snippet of the client side code.
I am appending the file as well as the meta data to the form
const formData = new FormData();
formData.append('name', JSON.stringify({name: "John Doe"}));
formData.append('file', this.selectedFile, this.selectedFile.name);
Here is the nodejs side:
exports.Upload = async (req, res) => {
try {
var acceptUpload = false;
const bb = busboy({ headers: req.headers });
bb.on('field', (fieldname, val) => {
//Verify data here before accepting file upload
var data = JSON.parse(val);
if (val.name === 'John Doe') {
acceptUpload = true;
} else {
acceptUpload = false;
}
});
bb.on('file', (fieldname, file, filename, encoding, mimetype) => {
if (acceptUpload) {
const saveTo = '/upload/file.txt'
file.pipe(fs.createWriteStream(saveTo));
}else{
response = {
message: 'Not Authorized'
}
res.status(403).json(response);
}
});
bb.on('finish', () => {
response = {
message: 'Upload Successful'
}
res.status(200).json(response);
});
req.pipe(bb);
} catch (error) {
console.log(error)
response = {
message: error.message
}
res.status(500).json(response);
}
}
So basically, is it even possible for the 'field' event-handler to wait for the 'file' event handler? How could one verify some meta data before accepting a file upload?
How can I do validation of all data in the form data object, before accepting the file upload? Is this even possible, or are there other ways of uploading files with this kind of behaviour? I am considering even adding data to the request header, but this does not seem like the ideal solution.
Update
As I suspected, nothing is waiting. Which ever way I try, the upload first has to be completed, only then after is it rejected with a 403
Another Update
Ive tried the same thing with multer and have similar results. Even when I can do the validation, the file is completely uploaded from the client side. Once the upload is complete, only then the request is rejected. The file, however, never gets stored, even though it is uploaded in its entirety.

With busboy, nothing is written to the server if you do not execute the statement file.pipe(fs.createWriteStream(saveTo));
You can prevent more data from even being uploaded to the server by executing the statement req.destroy() in the .on("field", ...) or the .on("file", ...) event handler, even after you have already evaluated some of the fields. Note however, that req.destroy() destroys not only the current HTTP request but the entire TCP connection, which might otherwise have been reused for subsequent HTTP requests. (This applies to HTTP/1.1, in HTTP/2 the relationship between connections and requests is different.)
At any rate, it has no effect on the current HTTP request if everything has already been uploaded. Therefore, whether this saves any network traffic depends on the size of the file. And if the decision whether to req.destroy() involves an asynchronous operation, such as a database lookup, then it may also come too late.
Compare
> curl -v -F name=XXX -F file=#<small file> http://your.server
* We are completely uploaded and fine
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server
with
> curl -v -F name=XXX -F file=#<large file> http://your.server
> Expect: 100-continue
< HTTP/1.1 100 Continue
* Send failure: Connection was reset
* Closing connection 0
curl: (55) Send failure: Connection was reset
Note that the client sets the Expect header before uploading a large file. You can use that fact in connection with a special request header name in order to block the upload completely:
http.createServer(app)
.on("checkContinue", function(req, res) {
if (req.headers["name"] === "John Doe") {
res.writeContinue(); // sends HTTP/1.1 100 Continue
app(req, res);
} else {
res.statusCode = 403;
res.end("Not authorized");
}
})
.listen(...);
But for small files, which are uploaded without the Expect request header, you still need to check the name header in the app itself.

Related

Why does the request pipe includes the headers in this code?

I have a strange situation regarding http server and piping request.
From my past experience, when piping the request object of a http server to a writable stream of some sort, it does not include the headers, just the payload.
Today however, I wrote some very simple code, and from some reason, I'm spending the past 2 hours trying to figure out why it writes the headers to the file (super confusing!)
Here's my code:
server = http.createServer((req, res) => {
f = '/tmp/dest'
console.log(`writing to ${f}`)
s = fs.createWriteStream(f)
req.pipe(s)
req.on('end', () => {
res.end("done")
})
})
server.listen(port)
I test this with the following curl command:
curl -XPOST -F 'data=#test.txt' localhost:8080
And this is what I'm getting when I'm reading /tmp/dest:
--------------------------993d19e02b7578ff
Content-Disposition: form-data; name="data"; filename="test.txt"
Content-Type: text/plain
hello - this is some text
--------------------------993d19e02b7578ff--
Why am I seeing the headers here? I expected it to only write the payload
I have a code I wrote about a year ago that streams directly to a file without the headers, I don't understand what's different, but this one did the trick:
imageRouter.post('/upload', async(req, res) => {
if(!req.is("image/*")) {
let errorMessage = `the /upload destination was hit, but content-type is ${req.get("Content-Type")}`;
console.log(errorMessage);
res.status(415).send(errorMessage);
return;
}
let imageType = req.get("Content-Type").split('/')[1];
let [ err, writeStream ] = await getWritableStream({ suffix: imageType });
if (err) {
console.log("error while trying to write", err);
return res.status(500).end();
}
let imageName = writeStream.getID();
req.on('end', () => {
req.unpipe();
writeStream.close();
res.json({
imageRelativeLink: `/images/${imageName}`,
imageFullLink: `${self_hostname}/images/${imageName}`
});
});
req.pipe(writeStream);
});
What's different? Why does my code from a year ago (last block) writes without the form-data/headers? The resulting file is only an image, without text, but this time (the first block) shows http headers in the resulting file

Instead of using pipe, try using on('data') and referring to req.data to pull off the contents. This will allow the http library to process the HTTP body format and handle the "headers" (really: form part descriptors) for you.
Node Streaming Consumer API
server = http.createServer((req, res) => {
f = '/tmp/dest'
console.log(`writing to ${f}`)
s = fs.createWriteStream(f)
req.on('data', chunk) => {
s.write(chunk);
}
req.on('end', () => {
s.close();
res.end("done")
})
})
server.listen(port)

As it turns out, I had a mistake in my understanding, and therefore made a mistake in my question.
What I thought were the headers, were actually http multipart specification. This is how curl uploads a file when used with this syntax.
What I actually needed was to change the way I test my code with curl to one of the following:
cat /path/to/test/file | curl -T - localhost:8080
# or
curl -T - localhost:8080 < /path/to/test/file
# or
curl -T /path-/to/test/file localhost:8080 < /path/to/test/file
Using the -T (or --upload-file) flag, curl uploads the file (or stdin) without wrapping it in an http form.

Internal server error om Azure when writing file from buffer to filesystem

Context
I am working on a Proof of Concept for an accounting bot. Part of the solution is the processing of receipts. User makes picture of receipt, bot asks some questions about it and stores it in the accounting solution.
Approach
I am using the BotFramework nodejs example 15.handling attachments that loads the attachment into an arraybuffer and stores it on the local filesystem. Ready to be picked up and send to the accounting software's api.
async function handleReceipts(attachments) {
const attachment = attachments[0];
const url = attachment.contentUrl;
const localFileName = path.join(__dirname, attachment.name);
try {
const response = await axios.get(url, { responseType: 'arraybuffer' });
if (response.headers['content-type'] === 'application/json') {
response.data = JSON.parse(response.data, (key, value) => {
return value && value.type === 'Buffer' ? Buffer.from(value.data) : value;
});
}
fs.writeFile(localFileName, response.data, (fsError) => {
if (fsError) {
throw fsError;
}
});
} catch (error) {
console.error(error);
return undefined;
}
return (`success`);
}
Running locally it all works like a charm (also thanks to mdrichardson - MSFT). Stored on Azure, I get
There was an error sending this message to your bot: HTTP status code InternalServerError
I narrowed the problem down to the second part of the code. The part that write to the local filesystem (fs.writefile). Small files and big files result in the same error on Azure.fs.writefile seams unable to find the file
What is happpening according to stream logs:
Attachment uploaded by user is saved on Azure
{ contentType: 'image/png',contentUrl:
'https://webchat.botframework.com/attachments//0000004/0/25753007.png?t=< a very long string>',name: 'fromClient::25753007.png' }
localFilename (the destination of the attachment) resolves into
localFileName: D:\home\site\wwwroot\dialogs\fromClient::25753007.png
Axios loads the attachment into an arraybuffer. Its response:
response.headers.content-type: image/png
This is interesting because locally it is 'application/octet-stream'
fs throws an error:
fsError: Error: ENOENT: no such file or directory, open 'D:\home\site\wwwroot\dialogs\fromClient::25753007.png
Some assistance really appreciated.

Removing ::fromClient prefix from attachment.name solved it. As #Sandeep mentioned in the comments, the special characters where probably the issue. Not sure what its purpose is. Will mention it in the Botframework sample library github repository.
[update] team will fix this. Was caused by directline service.

Streaming a zip download from cloud functions

I have a firebase cloud function that uses express to streams a zip file of images to the client. When I test the cloud function locally it works fine. When I upload to firebase I get this error:
Error: Can't set headers after they are sent.
What could be causing this error? Memory limit?
export const zipFiles = async(name, params, response) => {
const zip = archiver('zip', {zlib: { level: 9 }});
const [files] = await storage.bucket(bucketName).getFiles({prefix:`${params.agent}/${params.id}/deliverables`});
if(files.length){
response.attachment(`${name}.zip`);
response.setHeader('Content-Type', 'application/zip');
response.setHeader('Access-Control-Allow-Origin', '*')
zip.pipe(output);
response.on('close', function() {
return output.send('OK').end(); // <--this is the line that fails
});
files.forEach((file, i) => {
const reader = storage.bucket(bucketName).file(file.name).createReadStream();
zip.append(reader, {name: `${name}-${i+1}.jpg`});
});
zip.finalize();
}else{
output.status(404).send('Not Found');
}

What Frank said in comments is true. You need to decide all your headers, including the HTTP status response, before you start sending any of the content body.
If you intend to express that you're sending a successful response, simply say output.status(200) in the same way that you did for your 404 error. Do that up front. When you're piping a response, you don't need to do anything to close the response in the end. When the pipe is done, the response will automatically be flushed and finalized. You're only supposed to call end() when you want to bail out early without sending a response at all.
Bear in mind that Cloud Functions only supports a maximum payload of 10MB (read more about limits), so if you're trying to zip up more than that total, it won't work. In fact, there is no "streaming" or chunked responses at all. The entire payload is being built in memory and transferred out as a unit.

Send an image as the body of a request, image recived with a request from outside

Yeah i kinda didn't know how to type the title well...
I've a node server which recives an image via post form. I then want to send this image to Microsoft vision and the same Google service in order to gether information from both, do some stuff, and return a result to the user that has accessed my server.
My problem is: how do i send the actual data?
This is the actual code that cares of that:
const microsofComputerVision = require("microsoft-computer-vision");
module.exports = function(req, res)
{
var file;
if(req.files)
{
file = req.files.file;
// Everything went fine
microsofComputerVision.analyzeImage(
{
"Ocp-Apim-Subscription-Key": vision_key,
"content-type": "multipart/form-data",
"body": file.data.toString(),
"visual-features":"Tags, Faces",
"request-origin":"westcentralus"
}).then((result) =>
{
console.log("A");
res.write(result);
res.end();
}).catch((err)=>
{
console.log(err);
res.writeHead(400, {'Content-Type': 'application/json'});
res.write(JSON.stringify({error: "The request must contain an image"}));
res.end();
});
}
else
{
res.writeHead(400, {'Content-Type': 'application/octet-stream'});
res.write(JSON.stringify({error: "The request must contain an image"}));
res.end();
}
}
If instead of calling "analyzeImage" i do the following
res.set('Content-Type', 'image/jpg')
res.send(file.data);
res.end();
The browser renders the image correctly, which made me think "file.data" contains the actual file (considered it's of type buffer).
But apparently Microsoft does not agree with that, because when i send the request to computer vision i get the following response:
"InvalidImageFormat"
The only examples i found are here, and the "data" that is used in that example comes from a file system read, not stright from a request. But saving the file to load it and then delete it to me looks like an horrible workaround, so i'd rather like to know in what form and how should i work on the "file" that i have to send it correctly for the APIs call.
Edit: if i use file.data (which i thought was the most correct since it would be sending the raw image as the body) i get an error which says that i must use a string or a buffer as content. So apparently that file.data is not a buffer in the way "body" requires O.o i'm not understanding honestly.
Solved, the error was quite stupid. In the "then" part, res.write(result) did not accept result as argument. This happened when i actually used the corret request (file.data which is a buffer). The other errors occurred everytime i tryed using toString() on file.data, in that case the request wasn't accepted.

Solved, the request asked for a buffer, and file.data is indeed a buffer. After chacking file.data type in any possible way i started looking for other problems. The error was much easier and, forgive my being stupid, too stupid to be evident. The result was a json, and res.write didn't accept a json as argument.

This is how I did it with Amazon Recognition Image Classifier, I know its not the same service your using - hoping this helps a little thou:
const imagePath = `./bat.jpg`;
const bitmap = fs.readFileSync(imagePath);
const params = {
Image: { Bytes: bitmap },
MaxLabels: 10,
MinConfidence: 50.0
};
route.post('/', upload.single('image'), (req, res) => {
let params = getImage();
rekognition.detectLabels(params, function(err, data) {
if (err) {
console.log('error');
}else {
console.log(data);
res.json(data);
}
});
});

Node.js - Stream Binary Data Straight from Request to Remote server

I've been trying to stream binary data (PDF, images, other resources) directly from a request to a remote server but have had no luck so far. To be clear, I don't want to write the document to any filesystem. The client (browser) will make a request to my node process which will subsequently make a GET request to a remote server and directly stream that data back to the client.
var request = require('request');
app.get('/message/:id', function(req, res) {
// db call for specific id, etc.
var options = {
url: 'https://example.com/document.pdf',
encoding: null
};
// First try - unsuccessful
request(options).pipe(res);
// Second try - unsuccessful
request(options, function (err, response, body) {
var binaryData = body.toString('binary');
res.header('content-type', 'application/pdf');
res.send(binaryData);
});
});
Putting both data and binaryData in a console.log show that the proper data is there but the subsequent PDF that is downloaded is corrupt. I can't figure out why.

Wow, never mind. Found out Postman (Chrome App) was hijacking the request and response somehow. The // First Try example in my code excerpt works properly in browser.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

busboy wait for field data before accepting file upload - node.js

Related

Why does the request pipe includes the headers in this code?

Internal server error om Azure when writing file from buffer to filesystem

Streaming a zip download from cloud functions

Send an image as the body of a request, image recived with a request from outside

Node.js - Stream Binary Data Straight from Request to Remote server

Categories

Resources