I want to post large amount of data (around 10MB) to a nodejs(loopback) api server. My requirement is to ensure that the node server does not miss any api request coming towards it, even if other data is processing at the same time. This api will be called frequently from the scheduler.
Since there is a limit in config.json in loopback folder structure, which specifies the max limit of data to be sent. Is there any challenges to post these much amount of data to an api url(POST method)?
Or is there any mechanism to deal with the large amount of data, so that it will not affect the server performance when process these data.?
Check out TUS, an open protocol for resumable file uploads. It also has an official JavaScript client library. Here is an upload example borrowed from the JS library's Github page:
input.addEventListener("change", function(e) {
// Get the selected file from the input element
var file = e.target.files[0]
// Create a new tus upload
var upload = new tus.Upload(file, {
endpoint: "http://localhost:1080/files/",
retryDelays: [0, 1000, 3000, 5000],
metadata: {
filename: file.name,
filetype: file.type
},
onError: function(error) {
console.log("Failed because: " + error)
},
onProgress: function(bytesUploaded, bytesTotal) {
var percentage = (bytesUploaded / bytesTotal * 100).toFixed(2)
console.log(bytesUploaded, bytesTotal, percentage + "%")
},
onSuccess: function() {
console.log("Download %s from %s", upload.file.name, upload.url)
}
})
// Start the upload
upload.start()
}
For handling the upload on the server side, you can find multiple node packages in the npm repo. Here is an example.
Related
I have a Angular JS application, from this application I send data to my AWS API endpoint
/**
* Bulk sync with master
*/
async syncDataWithMaster(): Promise<AxiosResponse<any> | void> {
{
axios.defaults.headers.post.Authorization = token;
const url = this.endpoint;
return axios.post(url, compressed, {
onUploadProgress: progressEvent => {
console.log('uploading')
},
onDownloadProgress: progressEvent => {
console.log('downloading')
},
}).then((response) => {
if (response.data.status == 'success') {
return response;
} else {
throw new Error('Could not authenticate user');
}
});
} catch (e) {
}
return;
}
the api gateway triggers my Lambda function (NodeJS) with the data it received:
exports.handler = async (event) => {
const localData = JSON.parse(event.body);
/**
Here get data from master and compare with local data and send back any new data
**/
const response = {
statusCode: 200,
body: JSON.stringify(newData),
};
return response;
};
the lambda function will call the database and get the master data for a user (not shown in the example) and then this data is compare using various logic with the local data and it determines if we need to send any new rows back to the local device to be store/ updated. (Before anyone asks, the nature of the application needs full data)
This principle works great for 90% of my users. However some users have fairly large amounts of data the current maximum being around 17mb of data.
So my question is is it possible to stream the data to and from the lambda function? So stream the data to the function, process and stream back? So that it is not affected by payload limits from AWS?
Or is it possible to somehow, begin sending data to the function as a stream and then as data becomes available it starts streaming data back at the same time?
(Data is JSON format)
I am wondering what alternatives to this solution (as it need to be fairly quick as well max 30sec)
(One other idea I had was for certain data above a certain size, frist client saves to s3 using signed url. The calls the api gateway for lambda. Lambda gets the saved file and compare to master. New data to be returned saved to s3 if over certain size. Then signed url returned to client. Client downloads the new data and processes) - However I am not sure if this of cost effective and it sounds live execution time may be long
Thanks for any help, been trying to figure this out for a while now
I can't seem to find any up-to-date answers on how to cancel a file upload using Mongo, NodeJS & Angular. I've only come across some tuttorials on how to delete a file but that is NOT what I am looking for. I want to be able to cancel the file uploading process by clicking a button on my front-end.
I am storing my files directly to the MongoDB in chuncks using the Mongoose, Multer & GridFSBucket packages. I know that I can stop a file's uploading process on the front-end by unsubscribing from the subsribable responsible for the upload in the front-end, but the upload process keeps going in the back-end when I unsubscribe** (Yes, I have double and triple checked. All the chunks keep getting uploaded untill the file is fully uploaded.)
Here is my Angular code:
ngOnInit(): void {
// Upload the file.
this.sub = this.mediaService.addFile(this.formData).subscribe((event: HttpEvent<any>) => {
console.log(event);
switch (event.type) {
case HttpEventType.Sent:
console.log('Request has been made!');
break;
case HttpEventType.ResponseHeader:
console.log('Response header has been received!');
break;
case HttpEventType.UploadProgress:
// Update the upload progress!
this.progress = Math.round(event.loaded / event.total * 100);
console.log(`Uploading! ${this.progress}%`);
break;
case HttpEventType.Response:
console.log('File successfully uploaded!', event.body);
this.body = 'File successfully uploaded!';
}
},
err => {
this.progress = 0;
this.body = 'Could not upload the file!';
});
}
**CANCEL THE UPLOAD**
cancel() {
// Unsubscribe from the upload method.
this.sub.unsubscribe();
}
Here is my NodeJS (Express) code:
...
// Configure a strategy for uploading files.
const multerUpload = multer({
// Set the storage strategy.
storage: storage,
// Set the size limits for uploading a file to 120MB.
limits: 1024 * 1024 * 120,
// Set the file filter.
fileFilter: fileFilter
});
// Add new media to the database.
router.post('/add', [multerUpload.single('file')], async (req, res)=>{
return res.status(200).send();
});
What is the right way to cancel the upload without leaving any chuncks in the database?
So I have been trying to get to the bottom of this for 2 days now and I believe I have found a satisfying solution:
First, in order to cancel the file upload and delete any chunks that have already been uploaded to MongoDB, you need to adjust the fileFilter in your multer configuration in such a way to detect if the request has been aborted and the upload stream has ended. Then reject the upload by throwing an error using fileFilter's callback:
// Adjust what files can be stored.
const fileFilter = function(req, file, callback){
console.log('The file being filtered', file)
req.on('aborted', () => {
file.stream.on('end', () => {
console.log('Cancel the upload')
callback(new Error('Cancel.'), false);
});
file.stream.emit('end');
})
}
NOTE THAT: When canceling a file upload, you must wait for the changes to show up on your database. The chunks that have already been sent to the database will first have to be uploaded before the canceled file gets deleted from the database. This might take a while depending on your internet speed and the bytes that were sent before canceling the upload.
Finally, you might want to set up a route in your backend to delete any chunks from files that have not been fully uploaded to the database (due to some error that might have occured during the upload). In order to do that you'll need to fetch the all file IDs from your .chunks collection (by following the method specified on this link) and separate the IDs of the files whose chunks have been partially uploaded to the database from the IDs of the files that have been fully uploaded. Then you'll need to call GridFSBucket's delete() method on those IDs in order to get rid of the redundant chunks. This step is purely optional and for database maintenance reasons.
Try using try catch way.
There can be two ways it can be done.
By calling an api which takes the file that is currently been uploaded as it's parameter and then on backend do the steps of delete and clear the chunks that are present on the server
By handling in exception.
By sending a file size as a validation where if the backend api has received the file totally of it size then it is to be kept OR if the size of the received file is less that is due to cancellation of upload bin between then do the clearance steps where you just take the id and mongoose db of the files chuck and clear it.
I have created a Node JS server which does the following:
Uploads media files (videos and images) to the server using multer
If the media is an image, then resize it using sharp
It the media is a video , then resize and compress it using fluent-ffmpeg
Upload files to Firebase storage for backup
All this is working know fluently. The problem is that, when the size of an uploaded file is big, the request processing takes long time. So I want to show some progress on the client side as below:
State 1. The media is uploading -> n%
State 2. The media is compessing
State 3. The media is uploading to cloud -> n%
State 4. Result -> JSON = {status: "ok", uri: .., cloudURI: .., ..}
Firebase storage API has a functionality like this when we creating an upload task as shown below:
let uploadTask = imageRef.put(blob, { contentType: mime });
uploadTask.on('state_changed', (snapshot) => {
if (typeof snapshot.bytesTransferred == "number") {
let progress = (snapshot.bytesTransferred / snapshot.totalBytes) * 100;
console.log('Upload is ' + progress + '% done');
}
});
I have found that, it is possible to realize this using websockets, I am interested if there is other methods to do that.
The problem is described also here: http://www.tugberkugurlu.com/archive/long-running-asynchronous-operations-displaying-their-events-and-progress-on-clients
And there is one of the methods Accessing partial response using AJAX or WebSockets? but I am looking for a more flexible and professional solution.
I have solved this problem using GraphQL Subscriptions. The same approach can be realized using WebSockets. The steps to solve this problem are as below:
Post files to upload server
Generate operation unique ID and send it as response to the client
Ex: response = {op: "A78HNDGS89NSNBDV7826HDJ"}
Create a subscription by opID
Ex: subscription { uploadStatus(op: "A78HNDGS89NSNBDV7826HDJ") { status }}
Every time on status change send request to the GraphQL endpoint, which which publishes the data to the pubsub. To send GraphQL request from nodejs server you can use https://github.com/prisma-labs/graphql-request
Ex:
const { request } = require('graphql-request');
const GQL_URL = "YOUR_GQL_ENDPOINT";
const query = `query {
notify ("Status text goes here")
}`
request(GQL_URL, query).then(data =>
console.log(data)
)
notify resolver function publishes the data to the pubsub
context.pubsub.publish('uploadStatus', {
status: "Status text"
});
If you have more complicated architecture, you can use message brokers like RabbitMQ, Kafka etc.
If someone knows other solutions, please let us know )
I have created around 500 signed URLs of objects located in S3. Now when I try to download those objects from signed URL in a loop of
await Promise.all(signedUrls.map(async (url) => {
const val = await request(url, (error, response) => {
if (!error) {
console.log('Downloaded successfully');
} else {
console.log('error in downloading', error.message);
}
});
}));
I get this error for some of the URLs.
error in downloading getaddrinfo ENOTFOUND s3.amazonaws.com s3.amazonaws.com:443
I know all the signed URLs created are correct which I checked individually, but suspect that S3 there is an issue in downloading the files.
Need to check if there is any limit on S3 for requesting too many files.
S3 has no practical limit on the number of downloads or the number of concurrent downloads. In theory, there must be a limit because they have a finite amount of hardware in the AWS data centers, but that limit is so high that in practice you cannot reach it.
I am trying to find some solution to stream file on amazon S3 using node js server with requirements:
Don't store temp file on server or in memory. But up-to some limit not complete file, buffering can be used for uploading.
No restriction on uploaded file size.
Don't freeze server till complete file upload because in case of heavy file upload other request's waiting time will unexpectedly
increase.
I don't want to use direct file upload from browser because S3 credentials needs to share in that case. One more reason to upload file from node js server is that some authentication may also needs to apply before uploading file.
I tried to achieve this using node-multiparty. But it was not working as expecting. You can see my solution and issue at https://github.com/andrewrk/node-multiparty/issues/49. It works fine for small files but fails for file of size 15MB.
Any solution or alternative ?
You can now use streaming with the official Amazon SDK for nodejs in the section "Uploading a File to an Amazon S3 Bucket" or see their example on GitHub.
What's even more awesome, you finally can do so without knowing the file size in advance. Simply pass the stream as the Body:
var fs = require('fs');
var zlib = require('zlib');
var body = fs.createReadStream('bigfile').pipe(zlib.createGzip());
var s3obj = new AWS.S3({params: {Bucket: 'myBucket', Key: 'myKey'}});
s3obj.upload({Body: body})
.on('httpUploadProgress', function(evt) { console.log(evt); })
.send(function(err, data) { console.log(err, data) });
For your information, the v3 SDK were published with a dedicated module to handle that use case : https://www.npmjs.com/package/#aws-sdk/lib-storage
Took me a while to find it.
Give https://www.npmjs.org/package/streaming-s3 a try.
I used it for uploading several big files in parallel (>500Mb), and it worked very well.
It very configurable and also allows you to track uploading statistics.
You not need to know total size of the object, and nothing is written on disk.
If it helps anyone I was able to stream from the client to s3 successfully (without memory or disk storage):
https://gist.github.com/mattlockyer/532291b6194f6d9ca40cb82564db9d2a
The server endpoint assumes req is a stream object, I sent a File object from the client which modern browsers can send as binary data and added file info set in the headers.
const fileUploadStream = (req, res) => {
//get "body" args from header
const { id, fn } = JSON.parse(req.get('body'));
const Key = id + '/' + fn; //upload to s3 folder "id" with filename === fn
const params = {
Key,
Bucket: bucketName, //set somewhere
Body: req, //req is a stream
};
s3.upload(params, (err, data) => {
if (err) {
res.send('Error Uploading Data: ' + JSON.stringify(err) + '\n' + JSON.stringify(err.stack));
} else {
res.send(Key);
}
});
};
Yes putting the file info in the headers breaks convention but if you look at the gist it's much cleaner than anything else I found using streaming libraries or multer, busboy etc...
+1 for pragmatism and thanks to #SalehenRahman for his help.
I'm using the s3-upload-stream module in a working project here.
There is also some good examples from #raynos in his http-framework repository.
Alternatively you can look at - https://github.com/minio/minio-js. It has minimal set of abstracted API's implementing most commonly used S3 calls.
Here is an example of streaming upload.
$ npm install minio
$ cat >> put-object.js << EOF
var Minio = require('minio')
var fs = require('fs')
// find out your s3 end point here:
// http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
var s3Client = new Minio({
url: 'https://<your-s3-endpoint>',
accessKey: 'YOUR-ACCESSKEYID',
secretKey: 'YOUR-SECRETACCESSKEY'
})
var outFile = fs.createWriteStream('your_localfile.zip');
var fileStat = Fs.stat(file, function(e, stat) {
if (e) {
return console.log(e)
}
s3Client.putObject('mybucket', 'hello/remote_file.zip', 'application/octet-stream', stat.size, fileStream, function(e) {
return console.log(e) // should be null
})
})
EOF
putObject() here is a fully managed single function call for file sizes over 5MB it automatically does multipart internally. You can resume a failed upload as well and it will start from where its left off by verifying previously upload parts.
Additionally this library is also isomorphic, can be used in browsers as well.