createBlockBlobFromStream() is overriding existing content instead of appending - node.js

We work with very large files so we devide them into chunk of 20mb then call the upload function to upload them to blob storage.I am calling upload() in node js where I found that I am missing something while upload. Only 20mb is getting uploaded each time , I doubt nodejs is overriding the content rather than appending the stream.
can somebody help to fix it?
const chunkSize = Number(request.headers["x-content-length"]);
const userrole = request.headers["x-userrole"];
const pathname = request.headers["x-pathname"];
var form = new multiparty.Form();
form.parse(request, function (err, fields, files) {
if (files && files["payload"] && files["payload"].length > 0) {
var fileContent = fs.readFileSync(files["payload"][0].path);
// log.error('fields',fields['Content-Type'])
fs.unlink(files["payload"][0].path, function (err) {
if (err) {
log.error("Error in unlink payload:" + err);
}
});
var size = fileContent.length;
if (size !== chunkSize) {
sendBadRequest(response, "Chunk uploading was not completed");
return;
}
//converting chunk[buffers] to readable stream
const stream = Readable.from(fileContent);
var options = {
contentSettings: {
contentType: fields['Content-Type']
}
}
blobService.createBlockBlobFromStream(containerName, pathname, stream, size, options, error => {
});
Headers:
X-id: 6023f6f53601233c080b1369
X-Chunk-Id: 38
X-Content-Id: 43bfdbf4ddd1d7b5cd787dc212be8691d8dd147017a2344cb0851978b2d983c075c26c6082fd27a5147742b030856b0d71549e4d208d1c3c9d94f957e7ed1c92
X-pathname: 6023f6ae3601233c080b1365/spe10_lgr311_2021-02-10_09-08-37/output/800mb.inc
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryqodrlQNFytNS9wAc
X-Content-Name: 800mb.inc

The issue is that you're using createBlockBlobFromStream which will overwrite the contents of a blob. This method is used to create a blob in a single request (i.e. complete blob data is passed as input). From the documentation here:
Uploads a block blob from a stream. If the blob already exists on the
service, it will be overwritten. To avoid overwriting and instead
throw an error if the blob exists, please pass in an accessConditions
parameter in the options object.
In your case, you're uploading chunks of the data. What you would need to do is use createBlockFromStream method for each chunk that you're uploading.
Once all chunks are uploaded, you would need to call commitBlocks method to create the blob.
UPDATE
How can we generate the blockId?
A block id is simply a string. Key thing to remember is that when you're calling createBlockFromStream, the length of block id of each block you're sending must be the same. You can't use 1,2,3,4,5,6,7,8,9,10,11... for example. You will have to use something like 01,02,03,04,05,06,07,08,09,10,11... so that they're of same length. You can use GUID for that purpose. Also, the maximum length of the block id is 50 characters.
should it be unique for all?
Yes. For a blob, the block ids must be unique otherwise it will overwrite the content of a previous block uploaded with the same id.
can u show the example code so that i can try similar way to
implement?
Please take a look here.
Basically the idea is very simple: On your client side, you're chunking the file and sending each chunk separately. What you will do is apart from sending that data, you will also send a block id string. You will also need to keep that block id on your client side. You will repeat the same process for all the chunks of your file.
Once all chunks are uploaded successfully, you will make one more request to your server and send the list of all the block ids. Your server at that time will call commitBlocks to create the blob.
Please note that the order of block ids in your last request is important. Azure Storage Service will use this information to stitch the blob together.

Related

Lambda stream encoded string

I stored the audio file in an array buffer and encoded it with base64. I need to send the data from the lambda to react client. For the larger audio files, I'm facing a lambda payload limit error.
Is there any way to stream the data in chunks from the Lambda to client ?
function readFile(filepath, callback) {
//Uint8Array
fs.readFile(filepath, (err, data) => {
// Data is a Buffer object
if (err) console.log(err);
callback(data);
})
readFile(`${outputFile}`, function (data) {
try {
let base64enc = base64.encode(data);
responseBody.message = base64enc;
status = statusCodes.OK;
return sendResponse(status, responseBody);//Sending response
} catch (err) {
console.log("error " + err);
reject(err);
}
});
No, there isn't. Each Lambda function call can return one payload and Lambda function calls are independent. I suggest two possible solutions.
Request a specific chunk. You can call the Lambda function multiple times with each call requesting a specific chunk of the data and merging them together as one data (file in your case) on the frontend.
Use S3. You usually handle media files using S3. Assuming you already have the audio file on S3, you generate and return the pre-signed get URL object in Lambda, and use the url to get the object on the frontend. (You can refer to code on Presigned URL generation code times-out as Lambda, works locally or other sources). You can also upload audio files by getting the pre-signed put URL in Lambda and using the url on the frontend to upload.
I would suggest the second solution because it is a more standard way of dealing with media files.

Cancel File Upload: Multer, MongoDB

I can't seem to find any up-to-date answers on how to cancel a file upload using Mongo, NodeJS & Angular. I've only come across some tuttorials on how to delete a file but that is NOT what I am looking for. I want to be able to cancel the file uploading process by clicking a button on my front-end.
I am storing my files directly to the MongoDB in chuncks using the Mongoose, Multer & GridFSBucket packages. I know that I can stop a file's uploading process on the front-end by unsubscribing from the subsribable responsible for the upload in the front-end, but the upload process keeps going in the back-end when I unsubscribe** (Yes, I have double and triple checked. All the chunks keep getting uploaded untill the file is fully uploaded.)
Here is my Angular code:
ngOnInit(): void {
// Upload the file.
this.sub = this.mediaService.addFile(this.formData).subscribe((event: HttpEvent<any>) => {
console.log(event);
switch (event.type) {
case HttpEventType.Sent:
console.log('Request has been made!');
break;
case HttpEventType.ResponseHeader:
console.log('Response header has been received!');
break;
case HttpEventType.UploadProgress:
// Update the upload progress!
this.progress = Math.round(event.loaded / event.total * 100);
console.log(`Uploading! ${this.progress}%`);
break;
case HttpEventType.Response:
console.log('File successfully uploaded!', event.body);
this.body = 'File successfully uploaded!';
}
},
err => {
this.progress = 0;
this.body = 'Could not upload the file!';
});
}
**CANCEL THE UPLOAD**
cancel() {
// Unsubscribe from the upload method.
this.sub.unsubscribe();
}
Here is my NodeJS (Express) code:
...
// Configure a strategy for uploading files.
const multerUpload = multer({
// Set the storage strategy.
storage: storage,
// Set the size limits for uploading a file to 120MB.
limits: 1024 * 1024 * 120,
// Set the file filter.
fileFilter: fileFilter
});
// Add new media to the database.
router.post('/add', [multerUpload.single('file')], async (req, res)=>{
return res.status(200).send();
});
What is the right way to cancel the upload without leaving any chuncks in the database?
So I have been trying to get to the bottom of this for 2 days now and I believe I have found a satisfying solution:
First, in order to cancel the file upload and delete any chunks that have already been uploaded to MongoDB, you need to adjust the fileFilter in your multer configuration in such a way to detect if the request has been aborted and the upload stream has ended. Then reject the upload by throwing an error using fileFilter's callback:
// Adjust what files can be stored.
const fileFilter = function(req, file, callback){
console.log('The file being filtered', file)
req.on('aborted', () => {
file.stream.on('end', () => {
console.log('Cancel the upload')
callback(new Error('Cancel.'), false);
});
file.stream.emit('end');
})
}
NOTE THAT: When canceling a file upload, you must wait for the changes to show up on your database. The chunks that have already been sent to the database will first have to be uploaded before the canceled file gets deleted from the database. This might take a while depending on your internet speed and the bytes that were sent before canceling the upload.
Finally, you might want to set up a route in your backend to delete any chunks from files that have not been fully uploaded to the database (due to some error that might have occured during the upload). In order to do that you'll need to fetch the all file IDs from your .chunks collection (by following the method specified on this link) and separate the IDs of the files whose chunks have been partially uploaded to the database from the IDs of the files that have been fully uploaded. Then you'll need to call GridFSBucket's delete() method on those IDs in order to get rid of the redundant chunks. This step is purely optional and for database maintenance reasons.
Try using try catch way.
There can be two ways it can be done.
By calling an api which takes the file that is currently been uploaded as it's parameter and then on backend do the steps of delete and clear the chunks that are present on the server
By handling in exception.
By sending a file size as a validation where if the backend api has received the file totally of it size then it is to be kept OR if the size of the received file is less that is due to cancellation of upload bin between then do the clearance steps where you just take the id and mongoose db of the files chuck and clear it.

Respond with two files data on readFile - Node

I'm using node to respond clients with two files. For now, i'm using a endpoint for each file, cause i can't figure out how pass more than one in a row.
Here's the function that responds with the file:
exports.chartBySHA1 = function (req, res, next, id) {
var dir = './curvas/' + id + '/curva.txt'; // id = 1e4cf04ad583e483c27b40750e6d1e0302aff058
fs.readFile(dir, function read(err, data) {
if (err) {
res.status(400).send("Não foi possível buscar a curva.");
}
content = data;
res.status(200).send(content);
});
};
Besides that, i need to change the default name of the file, when i reach that endpoint, the name brings 1e4cf04ad583e483c27b40750e6d1e0302aff058, but i'm passing the content of 'curva.txt'.
Someone has any tips?
Q: How do I pass back contents of more than one file back to a user without having to create individual endpoints.
A: There are a few ways you can do this.
If the content of each file is not huge then the easiest way out is to read in all of the contents and then transmit them back as a javascript key-value object. E.g.
let data = {
file1: "This is some text from file 1",
file2: "Text for second file"
}
res.send(data);
res.end();
If the content is particularly large then you can stream the data across to the client, while doing so you could add some metadata or hints to tell the client what they are going to receive in the next moment and when is the end of file.
There is probably some libraries which can do the latter for you already, so I would suggest you shop around in github before designing/writing your own.
The former method is the easiest.

How to make the client download a very large file that is genereted on the fly

I have an export function that read the entire database and create a .xls file with all the records. Then the file is sent to the client.
Of course, the time of export the full database requires a lot of time and the request will soon end in a timeout error.
What is the best solution to handle this case?
I heard something about making a queue with Redis for example but this will require two requests: one for starting the job that will generate the file and the second to download the generated file.
Is this possible with a single request from the client?
Excel Export:
Use Streams. Following is a rough idea of what might be done:
Use exceljs module. Because it has a streaming API aimed towards this exact problem.
var Excel = require('exceljs')
Since we are trying to initiate a download. Write appropriate headers to response.
res.status(200);
res.setHeader('Content-disposition', 'attachment; filename=db_dump.xls');
res.setHeader('Content-type', 'application/vnd.ms-excel');
Create a workbook backed by Streaming Excel writer. The stream given to writer is server response.
var options = {
stream: res, // write to server response
useStyles: false,
useSharedStrings: false
};
var workbook = new Excel.stream.xlsx.WorkbookWriter(options);
Now, the output streaming flow is all set up. for the input streaming, prefer a DB driver that gives query results/cursor as a stream.
Define an async function that dumps 1 table to 1 worksheet.
var tableToSheet = function (name, done) {
var str = dbDriver.query('SELECT * FROM ' + name).stream();
var sheet = workbook.addWorksheet(name);
str.on('data', function (d) {
sheet.addRow(d).commit(); // format object if required
});
str.on('end', function () {
sheet.commit();
done();
});
str.on('error', function (err) {
done(err);
});
}
Now, lets export some db tables, using async module's mapSeries:
async.mapSeries(['cars','planes','trucks'],tableToSheet,function(err){
if(err){
// log error
}
res.end();
})
CSV Export:
For CSV export of a single table/collection module fast-csv can be used:
// response headers as usual
res.status(200);
res.setHeader('Content-disposition', 'attachment; filename=mytable_dump.csv');
res.setHeader('Content-type', 'text/csv');
// create csv stream
var csv = require('fast-csv');
var csvStr = csv.createWriteStream({headers: true});
// open database stream
var dbStr = dbDriver.query('SELECT * from mytable').stream();
// connect the streams
dbStr.pipe(csvStr).pipe(res);
You are now streaming data from DB to HTTP response, converting it into xls/csv format on the fly. No need to buffer or store the entire data in memory or in a file.
You do not have to send the whole file once, you can send this file by chunks (line by line for example), just use res.write(chunk) and res.end() at finish to mark it as completed.
You can either send the file information as a stream, sending each individual chunk as it gets created via res.write(chunk), or, if sending the file chunk by chunk is not an option, and you have to wait for the entire file before sending any information, you can always keep the connection open by setting the timeout duration to Infinity or any value you think will be high enough to allow the file to be created. Then set up a function that creates the .xls file and either:
1) Accepts a callback that receives the data output as an argument once ready, sends that data, and then closes the connection, or;
2) Returns a promise that resolves with the data output once its ready, allowing you to send the resolved value and close the connection just like with the callback version.
It would look something like this:
function xlsRouteHandler(req, res){
res.setTimeout(Infinity) || res.socket.setTimeout(Infinity)
//callback version
createXLSFile(...fileCreationArguments, function(finishedFile){
res.end(finishedFile)
})
//promise version
createXLSFile(...fileCreationArguments)
.then(finishedFile => res.end(finishedFile))
}
If you still find yourself concerned about timing out, you can always set an interval timer to dispatch an occasional res.write() message to prevent a timeout on the server connection and then cancel that interval once the final file content is ready to be sent.
Refer to this link which uses jedis (redis java client)
The key to this is the LPOPRPUSH command
https://blog.logentries.com/2016/05/queuing-tasks-with-redis/

Node.js stream upload directly to Google Cloud Storage

I have a Node.js app running on a Google Compute VM instance that receives file uploads directly from POST requests (not via the browser) and streams the incoming data to Google Cloud Storage (GCS).
I'm using Restify b/c I don't need the extra functionality of Express and because it makes it easy to stream the incoming data.
I create a random filename for the file, take the incoming req and toss it to a neat little Node wrapper for GCS (found here: https://github.com/bsphere/node-gcs) which makes a PUT request to GCS. The documentation for GCS using PUT can be found here: https://developers.google.com/storage/docs/reference-methods#putobject ... it says Content-Length is not necessary if using chunked transfer encoding.
Good news: the file is being created inside the appropriate GCS storage "bucket"!
Bad News:
I haven't figured out how to get the incoming file's extension from Restify (notice I'm manually setting '.jpg' and the content-type manually).
The file is experiencing slight corruption (almost certainly do to something I'm doing wrong with the PUT request). If I download the POSTed file from Google, OSX tells me its damaged ... BUT, if I use PhotoShop, it opens and looks just fine.
Update / Solution
As pointed out by vkurchatkin, I needed to parse the request object instead of just piping the whole thing to GCS. After trying out the lighter busboy module, I decided it was just a lot easier to use multiparty. For dynamically setting the Content-Type, I simply used Mimer (https://github.com/heldr/mimer), referencing the file extension of the incoming file. It's important to note that since we're piping the part object, the part.headers must be cleared out. Otherwise, unintended info, specifically content-type, will be passed along and can/will conflict with the content-type we're trying to set explicitly.
Here's the applicable, modified code:
var restify = require('restify'),
server = restify.createServer(),
GAPI = require('node-gcs').gapitoken,
GCS = require('node-gcs'),
multiparty = require('multiparty'),
Mimer = require('mimer');
server.post('/upload', function(req, res) {
var form = new multiparty.Form();
form.on('part', function(part){
var fileType = '.' + part.filename.split('.').pop().toLowerCase();
var fileName = Math.random().toString(36).slice(2) + fileType;
// clear out the part's headers to prevent conflicting data being passed to GCS
part.headers = null;
var gapi = new GAPI({
iss: '-- your -- #developer.gserviceaccount.com',
scope: 'https://www.googleapis.com/auth/devstorage.full_control',
keyFile: './key.pem'
},
function(err) {
if (err) { console.log('google cloud authorization error: ' + err); }
var headers = {
'Content-Type': Mimer(fileType),
'Transfer-Encoding': 'Chunked',
'x-goog-acl': 'public-read'
};
var gcs = new GCS(gapi);
gcs.putStream(part, myBucket, '/' + fileName, headers, function(gerr, gres){
console.log('file should be there!');
});
});
});
};
You can't use the raw req stream since it yields whole request body, which is multipart. You need to parse the request with something like multiparty give you a readable steam and all metadata you need.

Resources