How to gunzip stream in nodejs? - node.js

I'm trying to accomplish quite an easy task but I'm a little bit confused and got stuck on using zlib in nodejs. I'm building functionality that includes me downloading file from aws S3 which is gziped, unzipping it and reading it line by line. I want to accomplish all of this using streams as I believe it is possible to do so in nodejs.
Here is my current code base:
//downloading zipped file from aws s3:
//params are configured correctly to access my aws s3 bucket and file
s3.getObject(params, function(err, data) {
if (err) {
console.log(err);
} else {
//trying to unzip received stream:
//data.Body is a buffer from s3
zlib.gunzip(data.Body, function(err, unzippedStream) {
if (err) {
console.log(err);
} else {
//reading line by line unzziped stream:
var lineReader = readline.createInterface({
input: unzippedStream
});
lineReader.on('line', function(lines) {
console.log(lines);
});
}
});
}
});
I get an error saying:
readline.js:113
input.on('data', ondata);
^
TypeError: input.on is not a function
I believe a problem might be in unzipping process, but I'm not too sure what is wrong, any help would be appreciated.

I don't have an S3 account to test with, but reading the docs suggests that s3.getObject() can return a stream, in which case I think that this might work:
var lineReader = readline.createInterface({
input: s3.getObject(params).pipe(zlib.createGunzip())
});
lineReader.on('line', function(lines) {
console.log(lines);
});
EDIT: looks like the API may have changed, and you're now required to instantiate a stream object manually before you can pipe it through anything else:
s3.getObject(params).createReadStream().pipe(...)

Related

How to work with node.js stream pipeline in AWS lambda function?

I am trying to parse and save CSV file to DB using AWS lambda from S3 bucket. It perfectly works and saves the file to DB on the local device but in the AWS it doesn't work. No issue, no errors, looks like piping doesn't work. I tried to subscribe on 'data' event of readableStream - nothing. Streams successfully initialized. To read a file from S3 I use SDK method s3.getObject(params).createReadStream().
Lambda Node.js runtime - 14.x
To parse csv I use - csv-parser
Could you pls to help me understand where is the problem?
async processSaveFile(bucket: string, key: string) {
const readableStream = this.s3Client.getObjectReadStream({ Bucket: bucket, Key: key })
console.log('SAVE FILE START')
const transformStream = new CurriculumTransformStream()
const writableStream = new CurriculumWriteStream()
await pipeline(
readableStream,
csv(),
transformStream,
writableStream,
(err) => {
if (err) {
console.log('err = ', err)
return
}
}
)
// readableStream.pipe(csv()).pipe(transformStream).pipe(writableStream)
console.log('SAVE FILE END')
return
}

Get audio duration in aws lambda nodejs function

I am currently trying to get the duration/length of an audio file read from S3. I tried a bunch of different ways, but can't seem to find an efficient one that works. Currently, I am storing the audio files in /tmp/ folder and than trying to read them, but that doesn't seem to work. I am also working with .mp4a not .mp3, but was testing the below initially with url strings from S3, but got 0:00 returned when the audio was read. I also tried getAudioDurationInSeconds, but that tells me "error locating ffprobe". Any pointer would be greatly appreciated.
s3.getObject({ Bucket: bucketName, Key: `audio/file` }, function (err, data) {
if (err) {
console.error(err.code, "-", err.message);
}
fs.writeFile(`/tmp/file`, data.Body)
.then(() => { // This ensures that your mp3Duration function gets called after the file has been written
// getAudioDurationInSeconds(`/tmp/file`).then((duration) => {
// console.log(duration);
// });
mp3Duration(`/tmp/$file`, function (err, duration) {
if (err) return console.log(err.message);
console.log('Your file is ' + duration + ' seconds long');
}).catch(console.error);
});
});

How do I make a form-data request in node pulling the files from s3

Hey everyone so I am trying to make this type of request in nodejs. I assume you can do it with multer but there is one major catch I don't want to download the file or upload it from a form I want to pull it directly from s3, get the object and send it as a file along with the other data to my route. Is it possible to do that?
Yes it's completely possible. Assuming you know your way around the aws-sdk, you can create a method for retrieving the file and use this method to get the data in your route and do whatever you please with them.
Example: (Helper Method)
getDataFromS3(filename, bucket, callback) {
var params = {
Bucket: bucket,
Key: filename
};
s3.getObject(params, function(err, data) {
if (err) {
callback(true, err.stack); // an error occurred
}
else {
callback(false, data); //success in retrieving data.
}
});
}
Your Route:
app.post('/something', (req, res) => {
var s3Object = getDataFromS3('filename', 'bucket', (err, file) => {
if(err) {
return res.json({ message: 'File retrieval failed' });
}
var routeProperties = {};
routeProperties.file = file;
routeProperties.someOtherdata = req.body.someOtherData;
return res.json({routeProperties});
});
});
Of course, the code might not be totally correct. But this is an approach that you can use to get what you want. Hope this helps.
There are two ways that I see here, you can either:
pipe this request to user, it means that you still download it and pass it through but you don't save it anywhere, just stream it through your backend.
There is a very similar question asked here: Streaming file from S3 with Express including information on length and filetype
I'm just gonna copy & paste code snippet just for the reference how it could be done
function sendResponseStream(req, res){
const s3 = new AWS.S3();
s3.getObject({Bucket: myBucket, Key: myFile})
.createReadStream()
.pipe(res);
}
if the file gets too big for you to easily handle, create presigned URL in S3 and send it through. User then can download the file himself straight from S3 for a limited amount of time, more details here: https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURL.html

Grid fs returns corrupted file

I am facing the following issue with nodejs and gridfs. I have a bunch of .tif files I store in gridfs with gfs.createWriteStream, all of them are correct (I checked this with gdalinfo).
When I extract the files using gfs.createReadStream, some of them are corrupted; several bytes are modified in the tif header.
How can I investigate this problem? Is it also possible to read the chunks to know if they are corrupted?
Here is the code, writing to gfs:
const Grid = require('gridfs-stream');
var gfs = new Grid(mongoose.connection.db, mongoose.mongo);
[...]
var readstream = fs.createReadStream(filePath);
var writestream = gfs.createWriteStream({
filename: filename,
metadata: metadata,
mode: 'w',
content_type: 'image/tiff'
})
[..]
readstream.pipe(writestream);
The extraction is similar.
[EDIT]
Actually after further investigations, I realized that the corruption came before GridFs:
If I create a write stream to disk (using fs) in the mean time I create a write stream to GridFS, I also that same error in the files. So it seems that is is only related to fs TIF read/write ...
async.eachLimit(filesToCopy,4, function(file, next) {
var filePath = path.join(inputFolder, file);
var readStream = fs.createReadStream(filePath);
readStream.on('error') {
// do something
next(error);
return;
}
var writestream = fs.createWriteStream(newFilePath);
writestream.on('close', function(writtenfile) {
//do something
next();
}
readstream.pipe(writestream);
}, function(error) {
if (error) {
callback(error);
}
callback(null, ...)
});
Actually the problem didn't come from GridFS, nor from the reading.
The problem was that the .tif file was read by node js before it was completely flushed. This explains why it was so random, and why it was always the same byte corrupted.
Setting a timeout before file reading solved the issue.
Thanks robertklep for your posts it helped me finding the solution.

NodeJS - Check whether a SFTP remote file exists using "Sequest"

I'm new to NodeJS and I'm using "Sequest" package for reading contents of a SFTP remote file. It works great. However if the file that I'm trying to read, does not exist, then it throws exception and the app does not respond further.
So I want to check whether the file exists before trying to read it. Since I'm using a library function (sequest.get), I'm unable to handle the exception that occurs in the library method due to absence of the file specified.
Below is my code:
var reader = sequest.get('xyz#abc', fileName, opts);
reader.setEncoding('utf8');
reader.on('data', function(chunk) {
return res.send(chunk);
});
reader.on('end', function() {
console.log('there will be no more data.');
});
Ref: https://github.com/mikeal/sequest#gethost-path-opts
Sequest (https://github.com/mikeal/sequest) is a wrapper to SSH2 - (https://github.com/mscdex/ssh2).
Any help is greatly appreciated. Thank you.
You can listen to error event to handle such cases.
var reader = sequest.get('xyz#abc', fileName, opts);
reader.setEncoding('utf8');
reader.on('data', function(chunk) {
return res.send(chunk);
});
reader.on('end', function() {
console.log('there will be no more data.');
});
reader.on('error', function() {
console.log('file not found or some other error');
});

Resources