What does calling busboy.end(req.rawBody) do? - node.js

I'm working on an image upload using express and busboy. It currently works, but if I remove the last line busboy.end(req.rawBody), the code will eventually timeout. Is this the same as the Node .end() method, and if so how is this call passing the data to / actually initiating the busboy work?
const busboy = new Busboy({ headers: req.headers });
// Define busboy event listeners
busboy.on("field", (fieldname, value) => {
// Process fields
});
busboy.on("file", (fieldname, file, filename, encoding, mimetype) => {
// Process file
});
// Call busboy with rawBody of request
busboy.end(req.rawBody);

Related

Firebase Cloud Function upload excel file to parse data and save it to Firestore

Usecase: I need to upload an .xlsx file to a cloud function API parse the data to json form and upload it to Firebase Firestore. I am using typescript for writing cloud functions.
I read about multer and came to the understanding that using multer with firebase has some issues. Busboy was the other option.
And went through this example code : https://gist.github.com/tonkla/5e893aa8776923ad6a2c9c6b7c432f3d
The headers['content-type'] I am sending to busboy is 'application/x-www-form-urlencoded'
and tried to imply it in my code.
To check the file upload I used postman to upload the file to my url but its going into request timed out.
Postman Screenshot
export const fileUpload = functions.https.onRequest(async (req, res) => {
const data = await parseMultipartFormData(req.rawBody,req.headers['content-type'])
console.log("Data from busboy:"+data)
const workbook = xlsx.read(data,{type: 'buffer'})
const jsonRows = xlsx.utils.sheet_to_json(workbook.Sheets[workbook.SheetNames[0]])
res.status(200).json(jsonRows)
});
function parseMultipartFormData(rawBody, headers) {
return new Promise((resolve, reject) => {
const buffers : any[]= []
const busboy = new busboyMain({
headers: { 'content-type': headers },
})
busboy.on('file', (fieldname, file, filename, encoding, mimetype) => {
file.on('data', data => {
buffers.push(data);
})
file.on('end', () => {
resolve(Buffer.concat(buffers));
})
})
busboy.on('error', error => reject(error))
busboy.end(rawBody)
})
}

Upload file using NodeJS and BusBoy

I am uploading a file using NodeJS. My requirement is to read the stream into a variable so that I can store that into AWS SQS. I do not want to store the file on disk. Is this possible? I only need the uploaded file into stream. The code I am using is(upload.js):
var http = require('http');
var Busboy = require('busboy');
module.exports.UploadImage = function (req, res, next) {
var busboy = new Busboy({ headers: req.headers });
// Listen for event when Busboy finds a file to stream.
busboy.on('file', function (fieldname, file, filename, encoding, mimetype) {
// We are streaming! Handle chunks
file.on('data', function (data) {
// Here we can act on the data chunks streamed.
});
// Completed streaming the file.
file.on('end', function (stream) {
//Here I need to get the stream to send to SQS
});
});
// Listen for event when Busboy finds a non-file field.
busboy.on('field', function (fieldname, val) {
// Do something with non-file field.
});
// Listen for event when Busboy is finished parsing the form.
busboy.on('finish', function () {
res.statusCode = 200;
res.end();
});
// Pipe the HTTP Request into Busboy.
req.pipe(busboy);
};
How do I get the uploaded stream?
On busboy 'file' event you get parameter named 'file' and this is a stream so you can pipe it.
For example
busboy.on('file', function (fieldname, file, filename, encoding, mimetype) {
file.pipe(streamToSQS)
}
I hope that will help you.
busboy.on('file', function (fieldname, file, filename, encoding, mimetype) {
var filename = "filename";
s3Helper.pdfUploadToS3(file, filename);
}
busboy.on('finish', function () {
res.status(200).json({ 'message': "File uploaded successfully." });
});
req.pipe(busboy);
While the current and existing arguments assume one could actually just send the stream (file) off to something that can receive the stream, the actual chunks are received in the file callback methods you implemented.
From the docs: (https://www.npmjs.com/package/busboy)
file.on('data', function(data) {
// data.length bytes seems to indicate a chunk
console.log('File [' + fieldname + '] got ' + data.length + ' bytes');
});
file.on('end', function() {
console.log('File [' + fieldname + '] Finished');
});
Update:
Found the constructor docs, second argument is a readable stream.
file(< string >fieldname, < ReadableStream >stream, < string >filename, < string >transferEncoding, < string >mimeType) - Emitted for each new file form field found. transferEncoding contains the 'Content-Transfer-Encoding' value for the file stream. mimeType contains the 'Content-Type' value for the file stream.

How to await fields before processing the file stream in multipart form

I'm using SendGrid for receiving files via email. SendGrid parses the incoming emails and sends the files in a multipart form to an endpoint I have set up.
I don't want the files on my local disk so I stream them straight to Amazon S3. This works perfect.
But before I can stream to S3 I need to get hold of the destination mail address so I can work out the correct s3 folder. This is sent in a field named "to" in the form post. Unfortunately this field sometimes arrives after the files are arriving, hence I need a way to await the to-field before I'm ready to take the stream.
I thought I could wrap the onField in a promise and await the to-field from within the onFile. But this concept seems to lock it self up when the field arrives after the file.
I'm new to booth streams and promises. I would really appreciate if someone could tell me how to do this.
This is the non working pseudoish code:
function sendGridUpload(req, res, next) {
var busboy = new Busboy({ headers: req.headers });
var awaitEmailAddress = new Promise(function(resolve, reject) {
busboy.on('field', function(fieldname, val, fieldnameTruncated, valTruncated) {
if(fieldname === 'to') {
resolve(val);
} else {
return;
}
});
});
busboy.on('file', function(fieldname, file, filename, encoding, mimetype) {
function findInbox(emailAddress) {
console.log('Got email address: ' + emailAddress);
..find the inbox and generate an s3Key
return s3Key;
}
function saveFileStream(s3Key) {
..pipe the file directly to S3
}
awaitEmailAddress.then(findInbox)
.then(saveFileStream)
.catch(function(err) {
log.error(err)
});
});
req.pipe(busboy);
}
I finally got this working. The solution is not very pretty, and I have actually switched to another concept (described at the end of the post).
To buffer the incoming data until the "to"-field arrives I used stream-buffers by #samcday. When I get hold of the to-field I release the readable stream to the pipes lined up for the data.
Here is the code (some parts omitted, but essential parts are there).
var streamBuffers = require('stream-buffers');
function postInboundMail(req, res, next) {
var busboy = new Busboy({ headers: req.headers});
//Sometimes the fields arrives after the files are streamed.
//We need the "to"-field before we are ready for the files
//Therefore the onField is wrapped in a promise which gets
//resolved when the to field arrives
var awaitEmailAddress = new Promise(function(resolve, reject) {
busboy.on('field', function(fieldname, val, fieldnameTruncated, valTruncated) {
var emailAddress;
if(fieldname === 'to') {
try {
emailAddress = emailRegexp.exec(val)[1]
resolve(emailAddress)
} catch(err) {
return reject(err);
}
} else {
return;
}
});
});
busboy.on('file', function(fieldname, file, filename, encoding, mimetype) {
var inbox;
//I'm using readableStreamBuffer to accumulate the data before
//I get the email field so I can send the stream through to S3
var readBuf = new streamBuffers.ReadableStreamBuffer();
//I have to pause readBuf immediately. Otherwise stream-buffers starts
//sending as soon as I put data in in with put().
readBuf.pause();
function getInbox(emailAddress) {
return model.inbox.findOne({email: emailAddress})
.then(function(result) {
if(!result) return Promise.reject(new Error(`Inbox not found for ${emailAddress}`))
inbox = result;
return Promise.resolve();
});
}
function saveFileStream() {
console.log('=========== starting stream to S3 ========= ' + filename)
//Have to resume readBuf since we paused it before
readBuf.resume();
//file.save will approximately do the following:
// readBuf.pipe(gzip).pipe(encrypt).pipe(S3)
return model.file.save({
inbox: inbox,
fileStream: readBuf
});
}
awaitEmailAddress.then(getInbox)
.then(saveFileStream)
.catch(function(err) {
log.error(err)
});
file.on('data', function(data) {
//Fill readBuf with data as it arrives
readBuf.put(data);
});
file.on('end', function() {
//This was the only way I found to get the S3 streaming finished.
//Destroysoon will let the pipes finish the reading bot no more writes are allowed
readBuf.destroySoon()
});
});
busboy.on('finish', function() {
res.writeHead(202, { Connection: 'close', Location: '/' });
res.end();
});
req.pipe(busboy);
}
I would really much like feedback on this solution, even though I'm not using it. I have a feeling that this can be done much more simple and elegant.
New solution:
Instead of waiting for the to-field I send the stream directly to S3. I figured, the more stuff I put in between the incoming stream and the S3 saving, the higher the risk of loosing the incoming file due to a bug in my code. (SendGrid will eventually resend the file if I'm not responding with 200, but it will take some time.)
This is how I do it:
Save a placeholder for the file in the database
Pipe the stream to S3
Update the placeholder with more information as it arrives
This solution also gives me the opportunity to easily get hold of unsuccessful uploads since the placeholders for unsuccessful uploads will be incomplete.
//Michael

Streaming an uploaded file to an HTTP request

My goal is to accept an uploaded file and stream it to Wistia using the the Wistia Upload API. I need to be able to add fields to the HTTP request, and I don't want the file to touch the disk. I'm using Node, Express, Request, and Busboy.
The code below has two console.log statements. The first returns [Error: not implemented] and the second returns [Error: form-data: not implemented]. I'm new to streaming in Node, so I'm probably doing something fundamentally wrong. Any help would be much appreciated.
app.use("/upload", function(req, res, next) {
var writeStream = new stream.Writable();
writeStream.on("error", function(error) {
console.log(error);
});
var busboy = new Busboy({headers: req.headers});
busboy.on("file", function(fieldname, file, filename, encoding, mimetype) {
file.on("data", function(data) {
writeStream.write(data);
});
file.on("end", function() {
request.post({
url: "https://upload.wistia.com",
formData: {
api_password: "abc123",
file: new stream.Readable(writeStream)
}
}, function(error, response, body) {
console.log(error);
});
});
});
req.pipe(busboy);
});
I am not to familiar with the busboy module, but there errors you are getting are from attempting to use un-implemented streams. Whenever you create a new readable or writable stream directly from the stream module you have to create the _read and _write methods respectively Stream Implementors (node.js api). To give you something to work with the following example is using multer for handling multipart requests, I think you'll find multer is easier to use than busboy.
var app = require('express')();
var fs = require('fs');
var request = require('request');
app.use(multer());
app.post("/upload", function(req, res, next) {
// create a read stream
var readable = fs.createReadStream(req.files.myfile.path);
request.post({
url: "https://upload.wistia.com",
formData: {
api_password: "abc123",
file: readable
}
}, function(err, res, body) {
// send something to client
})
});
I hope this helps unfortunately I am not familiar with busboy, but this should work with multer, and as I said before there problem is just that you are using un-implemented streams I'm sure there is a way to configure this operation with busboy if you wanted.
If you want to use multipart (another npm) here is a tutorial:
http://qnimate.com/stream-file-uploads-to-storage-server-in-node-js/

busboy - is there a way to send the response when all files have been uploaded?

I'm trying to upload files to a server using node.js as backend and angular.js as frontend. I'm using express 4 + busboy for this. I have a table in the frontend where I should display all the files I'm uploading. So if I have 3 files and click on upload, angular should post these files to node.js and after getting the response back, refresh the table with those three files.
This is the function I'm using in angular:
function uploadFiles(files){
var fd = new FormData();
for(var i = 0; i<files.length; i++){
fd.append("file", files[i]);
}
$http.post('http://localhost:3000/upload', fd, {
withCredentials: false,
headers: {'Content-Type': undefined },
transformRequest: angular.identity
}).success(refreshTable()).error(function(){
console.log("error uploading");
});
}
and this is from node.js:
app.post('/upload', function(req, res) {
var busboy = new Busboy({ headers: req.headers });
busboy.on('file', function (fieldname, file, filename) {
console.log("Uploading: " + filename);
var fstream = fs.createWriteStream('./files/' + filename);
file.pipe(fstream);
});
busboy.on('finish', function(){
res.writeHead(200, { 'Connection': 'close' });
res.end("");
});
return req.pipe(busboy);
});
the problem is that if I upload three files, as soon as the first file has been uploaded node.js sends the response and hence the table is updated only with the first file uploaded, if I refresh the page, the rest of the files appear.
I think the problem is with this line in node: return req.pipe(busboy); if I remove that line, the post response keeps on pending for a long time and nothing happens, I think this is an async problem, anybody knows if there's a way to send the response back only when all files have been uploaded?
thanks
A simple and common solution to this particular problem is to use a counter variable and listening for the finish event on the fs Writable stream. For example:
app.post('/upload', function(req, res) {
var busboy = new Busboy({ headers: req.headers });
var files = 0, finished = false;
busboy.on('file', function (fieldname, file, filename) {
console.log("Uploading: " + filename);
++files;
var fstream = fs.createWriteStream('./files/' + filename);
fstream.on('finish', function() {
if (--files === 0 && finished) {
res.writeHead(200, { 'Connection': 'close' });
res.end("");
}
});
file.pipe(fstream);
});
busboy.on('finish', function() {
finished = true;
});
return req.pipe(busboy);
});
The reason for this is that busboy's finish event is emitted once the entire request has been fully processed, that includes files. However, there is some delay between when there is no more data to write to a particular file and when the OS/node has flushed its internal buffers to disk (and the closing of the file descriptor). Listening for the finish event for a fs Writable stream lets you know that the file descriptor has been closed and no more writes are going to occur.

Resources