Learning how to do large file manipulation with Node and streams I'm stuck in the middle of a file change when passing down the results to a module and I think the process is still in memory when it reaches another module.
I get a zip from an s3 bucket locally and unzip the contents:
try {
const stream = fs.createReadStream(zipFile).pipe(unzipper.Extract({ path }))
stream.on('error', err => console.error(err))
stream.on('close', async () => {
fs.removeSync(zipFile)
try {
const neededFile = await dir(path) // delete files not needed from zip, rename and return named file
await mod1(neededFile) // review file, edit and return info
await mod2(neededFile, data) // pass down data for further changes
return
} catch (err) {
console.log('error')
}
})
} catch (err) {
console.log('stream error')
}
Initial unzip I learned that there is a difference between stream on close and finish because I could pass the file to the first module and start the manipulation but the file, I guess due to the size, output and file never matched. After cleaning the files I dont need I pass the renamed file to mod1 for changes and run a write file sync:
mod1.js:
const fs = require('fs-extra')
module.exports = file => {
fs.readFile(file, 'utf8', (err, data) => {
if (err) return console.log(err)
try {
const result = data.replace(/: /gm, `:`).replace(/(?<=location:")foobar(?=")/gm, '')
fs.writeFileSync(file, result)
} catch (err) {
console.log(err)
return err
}
})
}
when I tried to do the above with:
const readStream = fs.createReadStream(file)
const writeStream = fs.createWriteStream(file)
readStream.on('data', chunk => {
const data = chunk.toString().replace(/: /gm, `:`).replace(/(?<=location:")foobar(?=")/gm, '')
writeStream.write(data)
})
readStream.on('end', () => {
writeStream.close()
})
the file would always be blank. After writeFileSync I proceed with the next module to search for a line ref:
mod2.js:
const fs = require('fs-extra')
module.exports = (file, data) => {
const parseFile = fs.readFileSync(file, 'utf8')
parseFile.split(/\r?\n/).map((line, idx) => {
if (line.includes(data)) console.log(idx + 1)
})
}
but the line number returned is that of the initial unzipped file not the file that was modded from the first module. Because I thought the sync process would be for the file it would appear the file being referenced is in memory? My search results for streams when learning about them:
Working with Node.js Stream API
Stream
How to use stream.pipe
Understanding Streams in Node.js
Node.js Streams: Everything you need to know
Streams, Piping, and Their Error Handling in Node.js
Writing to Files in Node.js
Error handling with node.js streams
Node.js Readable file stream not getting data
Node.js stream 'end' event not firing
NodeJS streams not awaiting async
stream-handbook
How should a file be manipulated after an unzip stream and why does the second module reference the file after it was unzipped and not when it was already manipulated? Is it possible to write multiple streams synchronously?
Related
I have a set of videos I want to take a screenshot from each of them, then do some processing on these generated images, and finally store them.
To be able to do the processing I need to get the screenshots as buffers.
this is my code
ffmpeg(videoFilePath)
.screenshots({
count: 1,
timestamps: ['5%'],
folder: DestinationFolderPath,
size: thumbnailWidth + 'x' + thumbnailHeight,
})
.on('err', function (error) {
console.log(err)
});
as you see the output is being directly stored in the DestinationFolderPath. Instead of that I want to get the output as a buffer.
I'm not sure how to do that directly, but the screenshot is saved in a folder in your file system, so you could read the file from there and convert it into a buffer.
const thumbnailStream = createReadStream(thumbnailPath)
const thumbnailBuffer = stream2buffer(thumbnailStream)
There's a lot of ways of transforming a stream into a buffer, you can check it out in this question.
e.g. from this answer
function stream2buffer(stream) {
return new Promise((resolve, reject) => {
const _buf = [];
stream.on("data", (chunk) => _buf.push(chunk));
stream.on("end", () => resolve(Buffer.concat(_buf)));
stream.on("error", (err) => reject(err));
});
}
const thumbnailStream = createReadStream(thumbnailPath)
const thumbnailBuffer = await stream2buffer(thumbnailStream)
And createReadStream is imported from fs
I currently have a csv file that is 1.3 million lines. I'm trying to parse this file line by line and run a processes on each line. The issue I am running into, is I run out of heap memory. I've read online and tried a bunch of solutions to not store the entire file into memory, but it seems nothing is working. Here is my current code:
const readLine = createInterface({
input: createReadStream(file),
crlfDelay: Infinity
});
readLine.on('line', async (line) => {
let record = parse2(`${line}`, {
delimiter: ',',
skip_empty_lines: true,
skip_lines_with_empty_values: false
});
// Do something with record
index++;
if (index % 1000 === 0) {
console.log(index);
}
});
// halts process until all lines have been processed
await once(readLine, 'close');
This starts off strong, but slowly the heap gets filled, and I run out of memory and the program crashes. I'm using a readstream, so I don't understand why the file is filling the heap.
Try using the library csv-parser https://www.npmjs.com/package/csv-parser
const csv = require('csv-parser');
const fs = require('fs');
fs.createReadStream('data.csv')
.pipe(csv())
.on('data', (row) => {
console.log(row);
})
.on('end', () => {
console.log('CSV file successfully processed');
});
Taken from: https://stackabuse.com/reading-and-writing-csv-files-with-node-js/
I had tried something similar for file for ~2GB and it worked without any issue with EventStream
var fs = require('fs');
var eventStream = require('event-stream');
fs
.createReadStream('veryLargeFile.txt')
.pipe(eventStream.split())
.pipe(
eventStream
.mapSync(function(line) {
// Do something with record `line`
}).on('error', function(err) {
console.log('Error while reading file.', err);
})
.on('end', function() {
// On End
})
)
Please try and let me know if it helps
The front-end is written in ReactJS, more specifically grommet. There are multiple pdf files to be served to the user on clicking the Download button. The files are stored in GridFS. I wish to give the user a zipped folder which contains all these files. How can I achieve this?
Thanks in advance.
I have it!! Super simple solution with archiver. Worked at first time.
Note: I am using sails.js. DBFile is my Model.
const GridFsAdapter = require('../adapters/gridfs-adapter');
const archiver = require('archiver');
async function downloadMultiple (query, res, filename) {
// create a stream for download
const archive = archiver('zip', {
zlib: {level: 9} // Sets the compression level.
});
// catch warnings (ie stat failures and other non-blocking errors)
archive.on('warning', (err) => {
if (err.code === 'ENOENT') {
// log warning
sails.log.warn(err);
} else {
// throw error
throw err;
}
});
archive.on('error', (err) => {
throw err;
});
// set file name
res.attachment(filename);
// pipe the stream to response before appending files/streams
archive.pipe(res);
// add your streams
await DBFile
.stream(query)
// like mongoDBs cursor.forEach() function. Avoids to have all record in memory at once
.eachRecord(async (dbFile) => {
// get the stream from db
const {stream, data} = await GridFsAdapter().read(dbFile.fileId);
// append stream including filename to download stream
archive.append(stream, {name: data.filename});
});
// tell the download stream, you have all your files added
archive.finalize();
}
I am trying to access a file in a private S3 bucket from a lambda function identified by Cognito.
Reading the stream works outside a lambda but not inside a lambda
Creating a pre-signed url works inside a lambda
Waiting for the the content to be ready as a string works inside a lambda
I've managed to get a pre-signed url to download the file. Using the same parameters, I've tried to write the read stream to a local file. A file gets created but it's empty. I couldn't catch any error in the process.
const s3 = new AWS.S3({ apiVersion: 'latest' });
const file = 's3Filename.csv'
const userId = event.requestContext.identity.cognitoIdentityId;
const s3Params = {
Bucket: 'MY_BUCKET',
Key: `private/${userId}/${file}`,
};
var fileStream = require('fs').createWriteStream('/path/to/my/file.csv');
var s3Stream = s3.getObject(s3Params).createReadStream();
// Try to print s3 stream errors
s3Stream
.on('error', function (err) {
console.error(err); // prints nothing
});
// Try to print fs errors
s3Stream
.pipe(fileStream)
.on('error', function (err) {
console.error('File Stream:', err); // prints nothing
})
.on('data', function (chunk) {
console.log(chunk); // prints nothing
})
.on('end', function () {
console.log('All the data in the file has been read'); // prints nothing
})
.on('close', function (err) {
console.log('Stream has been Closed'); // prints nothing
});
I am quite confident that my parameters are correct because I can get a pre-signed url that allows me to download the file.
console.log(s3.getSignedUrl('getObject', s3Params));
I can also read the file content using getObject().promise(). This could work but I'm parsing a CSV file and I'd rather go easy on the memory and parse the stream.
try
{
const s3Response = await s3.getObject(s3Params).promise();
let objectData = s3Response.Body.toString('utf-8');
console.log(objectData);
}
catch (ex)
{
console.error(ex);
}
Why is the file created from S3 stream empty? And why is there nothing that prints?
Could it be an access policy issue? If that's the case, why didn't I get any error when executing?
I am trying to write an app that uploads files to an ftp server in node.js using the npm module ftp. I have a file, foo.txt, whose content is a single line: "This is a test file to upload via ftp." My code is:
var Client = require("ftp");
var fs = require("fs");
var connection = require("./connections.js");
var c = new Client();
const ftpFolder = "./files/";
var fileList = [];
fs.readdir(ftpFolder, (err, files) => {
if(err) {
console.log(err);
} else {
files.forEach(file => {
console.log(file);
fileList.push(file);
});
}
console.log(fileList);
});
c.on("ready", function(){
fileList.forEach(file => {
c.put(file, "/backups/" + file, function(err){
if(err){
console.log(err);
} else {
console.log(file + " was uploaded successfully!");
}
c.end();
});
});
});
// Connect to ftp site
c.connect(connection.server_ftp);
I see the file foo.txt on the ftp server, but when I open it the contents are: "foo.txt". It appears to have written the name of the file to the file rather than uploading it. Any guidance would be appreciated!
When you read a directory, it gives you a list of files. It doesn't read the contents of the file, it just lists the names of the files in the dir.
You will need to use this file name to create a path to read the file from.
const path = require('path')
let filePath = path.join(ftpFolder, file)
let fileContents = fs.readFile(path, 'utf8', (err, data) => {
// do the upload
})
As a side note... While your directory reading may work, you should consider reading the directory after the connection is established. Otherwise, there is a chance you will see it fail because it's a race condition between the client connection and the directory read. You may need to read a directory with so many files that it resolves AFTER the client connects.
You could nest the callbacks, but another way to handle this is Promises. You can kick off both async methods at the same time, and handle the results when both have resolved
var filesPromise = new Promise((resolve, reject) => {
fs.readdir(ftpFolder, (err, files) => {
if(err) reject(err)
else resolve(files)
})
})
var connectionPromise = new Promise((resolve, reject) => {
c.on("ready", () => { resolve(c) }
c.connect(connection.server_ftp)
})
Promise.all([filesPromise, connectionPromise], results => {
results[0] // files
results[1] // client
}).catch(err => {console.error(err)})
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise
In my case which is very similar, I put the filename instead of the full path to file in c.put . From your code I think it is the same.