What to do to stop Express.js from duplicating long multipart requests? - node.js

Consider the following code:
var routes = function(app) {
app.post('/api/video', passport.authenticate('token', authentication), video.createVideo);
}
function createVideo(request, response) {
logger.info('starting create video');
upload(request, response, function(err) {
logger.info('upload finished', err);
//callback omitted for brevity
}
}
Upload is multer with multer-s3 middleware:
var upload = multer({
storage: s3({
dirname: config.apis.aws.dirname,
bucket: config.apis.aws.bucket,
secretAccessKey: config.apis.aws.secretAccessKey,
accessKeyId: config.apis.aws.accessKeyId,
region: config.apis.aws.region,
filename: function(req, file, cb) {
cb(null, req.user._id + '/' + uuid.v4() + path.extname(file.originalname));
}
}),
limits: {
fileSize: 1000000000
},
fileFilter: function(req, file, cb) {
if (!_.contains(facebookAllowedTypes, path.extname(file.originalname))) {
return cb(new Error('Only following types are allowed: ' + facebookAllowedTypes));
}
cb(null, true);
}
}).fields([{
name: 'video',
maxCount: 1
}]);
The code above does the following: it takes a file that is sent from somewhere and streams it to AWS S3 instance. multer-s3 uses s3fs in the background to create write stream and send the file as 5MB multiparts.
With big files, like 300MB it can take minutes to upload. And now something really strange happens. I can see in our frontend that it sends only one POST request on /api/video. Actually I also tried using Postman to make the request, not trusting our frontend.
It starts the upload, but after around 2 minutes it starts 2nd upload! If I try to upload smaller files, like 2-100MB then nothing of sorts happens. This is from my logs(from the code above):
{"name":"test-app","hostname":"zawarudo","pid":16953,"level":30,"msg":"starting create video","time":"2015-12-02T14:08:22.243Z","src":{"file":"/home/areinu/dev/projects/test-app-uploader/backend/app/services/videoService.js","line":169,"func":"createVideo"},"v":0}
{"name":"test-app","hostname":"zawarudo","pid":16953,"level":30,"msg":"starting create video","time":"2015-12-02T14:10:28.794Z","src":{"file":"/home/areinu/dev/projects/test-app-uploader/backend/app/services/videoService.js","line":169,"func":"createVideo"},"v":0}
{"name":"test-app","hostname":"zawarudo","pid":16953,"level":30,"msg":"upload finished undefined","time":"2015-12-02T14:12:46.433Z","src":{"file":"/home/areinu/dev/projects/test-app-uploader/backend/app/services/videoService.js","line":171},"v":0}
{"name":"test-app","hostname":"zawarudo","pid":16953,"level":30,"msg":"upload finished undefined","time":"2015-12-02T14:12:49.627Z","src":{"file":"/home/areinu/dev/projects/test-app-uploader/backend/app/services/videoService.js","line":171},"v":0}
As you can see both uploads end few ms after each other, but the second one starts after 2 minutes. The problem is - there should be only one upload!
All I did in postman was set my access token(so passport authorizes me) and added a file. This should create only 1 upload, meanwhile 2 happen, and both upload the same file.
Also, notice that both files get uploaded, both have different uuids(notice filename function creates the file names from uuid), both appear on s3, and both has proper size of 300MB, both can be downloaded and both work.
If the upload is smaller the duplication doesn't occur. What is the reason for this behavior? How to fix it?

The problem was very simple(I only spent whole day on figuring it out). It's just default timeout of node requests - 2 minutes. I don't know why it started another one nor why it actually worked, but setting default timeout on my server to 10 minutes fixed the issue.
If someone knows why the timed out requests actually did complete(and twice) please let me know. I'll improve the answer then.

Related

Cancel File Upload: Multer, MongoDB

I can't seem to find any up-to-date answers on how to cancel a file upload using Mongo, NodeJS & Angular. I've only come across some tuttorials on how to delete a file but that is NOT what I am looking for. I want to be able to cancel the file uploading process by clicking a button on my front-end.
I am storing my files directly to the MongoDB in chuncks using the Mongoose, Multer & GridFSBucket packages. I know that I can stop a file's uploading process on the front-end by unsubscribing from the subsribable responsible for the upload in the front-end, but the upload process keeps going in the back-end when I unsubscribe** (Yes, I have double and triple checked. All the chunks keep getting uploaded untill the file is fully uploaded.)
Here is my Angular code:
ngOnInit(): void {
// Upload the file.
this.sub = this.mediaService.addFile(this.formData).subscribe((event: HttpEvent<any>) => {
console.log(event);
switch (event.type) {
case HttpEventType.Sent:
console.log('Request has been made!');
break;
case HttpEventType.ResponseHeader:
console.log('Response header has been received!');
break;
case HttpEventType.UploadProgress:
// Update the upload progress!
this.progress = Math.round(event.loaded / event.total * 100);
console.log(`Uploading! ${this.progress}%`);
break;
case HttpEventType.Response:
console.log('File successfully uploaded!', event.body);
this.body = 'File successfully uploaded!';
}
},
err => {
this.progress = 0;
this.body = 'Could not upload the file!';
});
}
**CANCEL THE UPLOAD**
cancel() {
// Unsubscribe from the upload method.
this.sub.unsubscribe();
}
Here is my NodeJS (Express) code:
...
// Configure a strategy for uploading files.
const multerUpload = multer({
// Set the storage strategy.
storage: storage,
// Set the size limits for uploading a file to 120MB.
limits: 1024 * 1024 * 120,
// Set the file filter.
fileFilter: fileFilter
});
// Add new media to the database.
router.post('/add', [multerUpload.single('file')], async (req, res)=>{
return res.status(200).send();
});
What is the right way to cancel the upload without leaving any chuncks in the database?
So I have been trying to get to the bottom of this for 2 days now and I believe I have found a satisfying solution:
First, in order to cancel the file upload and delete any chunks that have already been uploaded to MongoDB, you need to adjust the fileFilter in your multer configuration in such a way to detect if the request has been aborted and the upload stream has ended. Then reject the upload by throwing an error using fileFilter's callback:
// Adjust what files can be stored.
const fileFilter = function(req, file, callback){
console.log('The file being filtered', file)
req.on('aborted', () => {
file.stream.on('end', () => {
console.log('Cancel the upload')
callback(new Error('Cancel.'), false);
});
file.stream.emit('end');
})
}
NOTE THAT: When canceling a file upload, you must wait for the changes to show up on your database. The chunks that have already been sent to the database will first have to be uploaded before the canceled file gets deleted from the database. This might take a while depending on your internet speed and the bytes that were sent before canceling the upload.
Finally, you might want to set up a route in your backend to delete any chunks from files that have not been fully uploaded to the database (due to some error that might have occured during the upload). In order to do that you'll need to fetch the all file IDs from your .chunks collection (by following the method specified on this link) and separate the IDs of the files whose chunks have been partially uploaded to the database from the IDs of the files that have been fully uploaded. Then you'll need to call GridFSBucket's delete() method on those IDs in order to get rid of the redundant chunks. This step is purely optional and for database maintenance reasons.
Try using try catch way.
There can be two ways it can be done.
By calling an api which takes the file that is currently been uploaded as it's parameter and then on backend do the steps of delete and clear the chunks that are present on the server
By handling in exception.
By sending a file size as a validation where if the backend api has received the file totally of it size then it is to be kept OR if the size of the received file is less that is due to cancellation of upload bin between then do the clearance steps where you just take the id and mongoose db of the files chuck and clear it.

ERR_CONNECTION_RESET when upload large file nodejs multer

I'm writing an web application that allow user to upload very large file (up to GB). My technical stack include: nodejs, express, multer and pure html. It works fine for small file. But when I upload big file (127 MB), I got error ERR_CONNECTION_RESET after waiting a while (about 2 minutes).
I tried extended response time on server, using both req.setTimeout and res.setTimeout but it didn't help. It's may be because frontend waiting to long to get response.
Below is the error I got:
Thank you all.
Increasing the res-timeout for the corresponding upload-route should definitely work. Try doing it like this:
function extendTimeout (req, res, next) {
// adjust the value for the timeout, here it's set to 3 minutes
res.setTimeout(180000, () => { // you can handle the timeout error here })
next();
})
app.post('/your-upload-route', extendTimeout, upload.single('your-file'), (req, res, next) => {
// handle file upload
})

AWS lambda function issue with FormData file upload

I have a nodejs code which uploads files to S3 bucket.
I have used koa web framework and following are the dependencies:
"#types/koa": "^2.0.48",
"#types/koa-router": "^7.0.40",
"koa": "^2.7.0",
"koa-body": "^4.1.0",
"koa-router": "^7.4.0",
following is my sample router code:
import Router from "koa-router";
const router = new Router({ prefix: '/' })
router.post('file/upload', upload)
async function upload(ctx: any, next: any) {
const files = ctx.request.files
if(files && files.file) {
const extension = path.extname(files.file.name)
const type = files.file.type
const size = files.file.size
console.log("file Size--------->:: " + size);
sendToS3();
}
}
function sendToS3() {
const params = {
Bucket: bName,
Key: kName,
Body: imageBody,
ACL: 'public-read',
ContentType: fileType
};
s3.upload(params, function (error: any, data: any) {
if (error) {
console.log("error", error);
return;
}
console.log('s3Response', data);
return;
});
}
The request body is sent as FormData.
Now when I run this code locally and hit the request, the file gets uploaded to my S3 bucket and can be viewed.
In Console the file size is displayed as follows:
which is the correct actual size of the file.
But when I deploy this code as lambda function and hit the request then I see that the file size has suddenly increased(cloudwatch log screenshot below).
Still that file gets uploaded to S3 but the issue is when I open the file it show following error.
I further tried to find whether this behaviour persisted on standalone instance on aws. But it did not. So the problem occurs only when the code is deployed as a serverless lambda function.
I tried with postman as well as my own front end app. But the issue remains.
I don't know whether I have overlooked any configuration when setting up the lambda function that handles such scenarios.
This is an unprecedented issue I have encountered, and really would want to know if any one else encountered same before. Also I am not able to debug and find why the file size is increasing. I can only assume that when the file reaches the service, some kind of encoding/padding is being done on the file.
Finally was able to fix this issue. Had to add "Binary Media Type" in AWS API Gateway
Following steps helped.
AWS API Gateway console -> "API" -> "Settings" -> "Binary Media Types" section.
Added following media type:
multipart/form-data
Save changes.
Deploy api.
Info location: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-payload-encodings-configure-with-console.html

Google Cloud Storage "invalid upload request" error. Bad request

I have been uploading files to google cloud storage from my node js server for quite a long time now. But sometimes the upload fails. Error message returned is something like:
{
reason:"badRequest",
code: 400,
message: "Invalid upload request"
}
It happens randomly, means once in around 25-30 days for some time and then is resolved automatically.
It's kind of weird and searching for it didn't give any solution or the reason.
The upload request is sent for two files in parallel having exactly the same data.
one was uploaded successfully and other failed.
code used:
const file = bucket.file(`data/${id}/${version}/abc.json`);
const dataBuffer = Buffer.from(JSON.stringify(dataToUpload));
file.save(dataBuffer, storageConfig)
.then(() => callback(null, true))
.catch(err => callback(err, null));
where storageConfig is
{
"contentType": "application/json",
"cacheControl": "public, max-age=600, s-maxage=3, no-transform"
}
and the second file name which is stored is
const file = bucket.file(`data/${id}/latest/abc.json`);
I am not able to find any reason for it and unable to handle it.
It crashed my related systems as they require that second file.
Setting resumable: false in the upload options solved the same error for me. For example : bucket.upload(pathToUpload, { destination: bucketPath, resumable: false })

S3 file upload stream using node js

I am trying to find some solution to stream file on amazon S3 using node js server with requirements:
Don't store temp file on server or in memory. But up-to some limit not complete file, buffering can be used for uploading.
No restriction on uploaded file size.
Don't freeze server till complete file upload because in case of heavy file upload other request's waiting time will unexpectedly
increase.
I don't want to use direct file upload from browser because S3 credentials needs to share in that case. One more reason to upload file from node js server is that some authentication may also needs to apply before uploading file.
I tried to achieve this using node-multiparty. But it was not working as expecting. You can see my solution and issue at https://github.com/andrewrk/node-multiparty/issues/49. It works fine for small files but fails for file of size 15MB.
Any solution or alternative ?
You can now use streaming with the official Amazon SDK for nodejs in the section "Uploading a File to an Amazon S3 Bucket" or see their example on GitHub.
What's even more awesome, you finally can do so without knowing the file size in advance. Simply pass the stream as the Body:
var fs = require('fs');
var zlib = require('zlib');
var body = fs.createReadStream('bigfile').pipe(zlib.createGzip());
var s3obj = new AWS.S3({params: {Bucket: 'myBucket', Key: 'myKey'}});
s3obj.upload({Body: body})
.on('httpUploadProgress', function(evt) { console.log(evt); })
.send(function(err, data) { console.log(err, data) });
For your information, the v3 SDK were published with a dedicated module to handle that use case : https://www.npmjs.com/package/#aws-sdk/lib-storage
Took me a while to find it.
Give https://www.npmjs.org/package/streaming-s3 a try.
I used it for uploading several big files in parallel (>500Mb), and it worked very well.
It very configurable and also allows you to track uploading statistics.
You not need to know total size of the object, and nothing is written on disk.
If it helps anyone I was able to stream from the client to s3 successfully (without memory or disk storage):
https://gist.github.com/mattlockyer/532291b6194f6d9ca40cb82564db9d2a
The server endpoint assumes req is a stream object, I sent a File object from the client which modern browsers can send as binary data and added file info set in the headers.
const fileUploadStream = (req, res) => {
//get "body" args from header
const { id, fn } = JSON.parse(req.get('body'));
const Key = id + '/' + fn; //upload to s3 folder "id" with filename === fn
const params = {
Key,
Bucket: bucketName, //set somewhere
Body: req, //req is a stream
};
s3.upload(params, (err, data) => {
if (err) {
res.send('Error Uploading Data: ' + JSON.stringify(err) + '\n' + JSON.stringify(err.stack));
} else {
res.send(Key);
}
});
};
Yes putting the file info in the headers breaks convention but if you look at the gist it's much cleaner than anything else I found using streaming libraries or multer, busboy etc...
+1 for pragmatism and thanks to #SalehenRahman for his help.
I'm using the s3-upload-stream module in a working project here.
There is also some good examples from #raynos in his http-framework repository.
Alternatively you can look at - https://github.com/minio/minio-js. It has minimal set of abstracted API's implementing most commonly used S3 calls.
Here is an example of streaming upload.
$ npm install minio
$ cat >> put-object.js << EOF
var Minio = require('minio')
var fs = require('fs')
// find out your s3 end point here:
// http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
var s3Client = new Minio({
url: 'https://<your-s3-endpoint>',
accessKey: 'YOUR-ACCESSKEYID',
secretKey: 'YOUR-SECRETACCESSKEY'
})
var outFile = fs.createWriteStream('your_localfile.zip');
var fileStat = Fs.stat(file, function(e, stat) {
if (e) {
return console.log(e)
}
s3Client.putObject('mybucket', 'hello/remote_file.zip', 'application/octet-stream', stat.size, fileStream, function(e) {
return console.log(e) // should be null
})
})
EOF
putObject() here is a fully managed single function call for file sizes over 5MB it automatically does multipart internally. You can resume a failed upload as well and it will start from where its left off by verifying previously upload parts.
Additionally this library is also isomorphic, can be used in browsers as well.

Resources