How to use promises correctly with multiple piped writestreams

How to use promises correctly with multiple piped writestreams - node.js

var promisePipe = require("promisepipe");
var fs = require("fs");
var crypt = require("crypto");
var // ....
files = ['/mnt/Storage/test.txt', '/mnt/Storage/test2.txt', '/mnt/Storage/test3.txt']
var promises = files.map(function(file_enc) {
return new Promise(function(resolve, reject) {
var file_out = file_enc + '.locked';
promisePipe(
fs.createReadStream(file_enc),
crypt.createCipheriv(alg, genhashsub, iv),
fs.createWriteStream(file_out),
).then(function(streams){
console.log('File written: ' + file_out);
// Promise.resolve(file_out); // tried but doesnt seem to do anything
}, function(err) {
if(err.message.substring(0, 7) === 'EACCES:') {
console.log('Error (file ' + file_out + '): Insufficient rights on file or folder');
} else {
console.log('Error (file ' + file_out + '): ' + err);
}
// Promise.reject(new Error(err)); // tried but doesnt seem to do anything
});
})
});
Promise.all(promises).then(final_function(argument));
I'm trying to encrypt files contained in an array named files.
For the sake of simplicity I added them manually in this example.
What I want to happen:
Create promises array to call with promises.all on completion
Iterate through the array
Create promise for this IO operation
Read file \
Encrypt file -- all done using streams, due to large files (+3GB)
Write file /
On finish write, resolve promise for this IO operation
Run finishing script once all promises have resolved (or rejected)
What happens:
Encryption of first file starts
.then(final_function(argument)) is called
Encryption of first file ends
The files all get encrypted correctly and they can be decrypted afterwards.
Also, errors are displayed, as well as the write confirmations.
I've searched both Stack as well as Google and I have found some similar questions (with answers). But they don't help because many are outdated. They work, until I rewrite them into promises, and then I'm back to where I started.
I could also place 8 different ways to achieve this job, using npm modules or vanilla code, but all of them are also failing in one way or another.

If you already have a promise at your disposal (and promisepipe appears to create a promise), then you generally should not use new Promise(). It looks like your main problem is that you are creating promises that you never resolve.
The other problem is that you are calling final_function in the last line instead of passing a function that will call final_function.
I suggest giving this a try:
var promises = files.map(function(file_enc) {
var file_out = file_enc + '.locked';
return promisePipe(
fs.createReadStream(file_enc),
crypt.createCipheriv(alg, genhashsub, iv),
fs.createWriteStream(file_out),
).then(function(streams){
console.log('File written: ' + file_out);
return file_out;
}, function(err) {
if(err.message.substring(0, 7) === 'EACCES:') {
console.log('Error (file ' + file_out + '): Insufficient rights on file or folder');
} else {
console.log('Error (file ' + file_out + '): ' + err);
}
throw new Error(err);
});
});
Promise.all(promises).then(() => final_function(argument));

an analog of short , file-based process wrapped in a promise all . You could do your encrypt in the 'encrypt' which is wrapped in file handler. encrypt() returns a promise.
segments passes your array of files needing work.
var filHndlr = function(segment){
var uri = segment.uri;
var path = '/tmp/' + uri;
return that.getFile(path)
.then (datatss => {
return that.encrypt(uri, datatss);
});
}
...
Promise.all(segments.map(filHndlr))
.then(resp => { ... });

Related

Function call inside loop gets called after the loop ends

So I use this API that helps me turn a .docx file into a .pdf. I placed the code that converts the file into a function. :
function conv(){
convertapi.convert('pdf', { File: final_path })
.then(function(result) {
// get converted file url
console.log("Converted file url: " + result.file.url);
finp = path + file_name.slice(0, file_name.length - 5) + ".pdf";
console.log(finp);
// save to file
return result.file.save(finp);
})
.then(function(file) {
console.log("File saved: " + file);
process.exit(1);
})
.catch(function(e) {
console.log("numele si/sau extensia fisierului sunt gresite");
process.exit(1);
});
}
The code above works only for one file at a time. I made a loop that goes through every file (.docx) in my folder and save its name into an array. I go through every item of the array and call the function :
for(var j = 0; j<=i ; j++){
file_name = toate_nume[j];
final_path = path + file_name;
conv();
}
The file names are stored correctly, but when I run my project, the function is called after the loop itself ends ( is called the correct number of times for each and every file). So if I have 2 files : test1.docx and test2.docx the output shows me that the conv() is called 2 times for the test2.docx, instead of one time for each file. What should I do?

The reason might be this:
The API is slow so your program is executing the loop faster than the API the can handle the requests. So what ends up happening is that you have modified the final_path variable twice before convertapi gets called, and then it gets called twice with the same final_path. Try to modify your conv function so that it accepts a parameter, e.g. path and uses that. Then call conv with the current final_path parameter:
conv(final_path)
And:
function conv(path) {
convertapi.convert('pdf', { File: path })
...

So you are calling n Promise in a serial. And you want to wait for the end?
You can use Promise. all
const toate_nume = ['fileName1', 'fileName2'];
const spawn = toate_nume.map(x => {
const final_path = path + x;
return conv(final_path);
});
Promise.all(spawn).then(results => {
console.log('All operation done successfully %o', results);
});
or use await:
const results = await Promise.all(spawn);
the results is an array, an entry for each call.
NOTE** I pass the path as an argument instead of a global var

GCF "No Such Object" when the Object in question was just created

I'm setting up a Google Cloud Functions (GCF) function that gets triggered often enough that there are multiple instances running at the same time.
I am getting errors from a readStream the source file of the stream does not exist, but at this point in my program I've actually just created it.
I've made sure the file exists before the start of the stream by console.log()-ing the file JSON, so the file does actually exist. I've also made sure that the file I'm trying to access has finished being written by a previous stream with an await, but no dice.
EDIT: The code now contains the entire script. The section that seems to be throwing the error is the function columnDelete().
var parse = require('fast-csv');
var Storage = require('#google-cloud/storage');
var Transform = require('readable-stream').Transform;
var storage = new Storage();
var bucket = storage.bucket('<BUCKET>');
const DMSs = ['PBS','CDK','One_Eighty','InfoBahn'];
class DeleteColumns extends Transform{
constructor(){
super({objectMode:true})
}
_transform(row, enc, done){
//create an array 2 elements shorter than received
let newRow = new Array(row.length - 2);
//write all data but the first two columns
for(let i = 0; i < newRow.length; i++){
newRow[i] = row[i+2];
}
this.push(newRow.toString() + '\n');
done();
}
}
function rename(file, originalFile, DMS){
return new Promise((resolve, reject) => {
var dealer;
var date;
var header = true;
var parser = parse({delimiter : ",", quote:'\\'});
//for each row of data
var stream = originalFile.createReadStream();
stream.pipe(parser)
.on('data', (row)=>{
//if this is the first line do nothing
if(header){
header = false;
}
//otherwise record the contents of the first two columns and then destroy the stream
else {
dealer = row[0].toString().replace('"', '').replace('"', '');
date = row[1].toString().replace('"', '').replace('"', '');
stream.end();
}
})
.on('finish', function(){
var newName = dealer + ' ' + date + '_' + DMS + 'temp.csv';
//if this was not triggered by the renaming of a file
if(!file.name.includes(dealer)&&!file.name.includes(':')){
console.log('Renamed ' + file.name);
originalFile.copy(newName);
originalFile.copy(newName.replace('temp',''));
}else{
newName = 'Not Renamed';
console.log('Oops, triggered by the rename');
}
resolve(newName);
});
});
}
function columnDelete(fileName){
return new Promise((resolve, reject) =>{
console.log('Deleting Columns...');
console.log(bucket.file(fileName));
var parser = parse({delimiter : ",", quote:'\\'});
var del = new DeleteColumns();
var temp = bucket.file(fileName);
var final = bucket.file(fileName.replace('temp', ''));
//for each row of data
temp.createReadStream()
//parse the csv
.pipe(parser)
//delete first two columns
.pipe(del)
//write to new file
.pipe(final.createWriteStream()
.on('finish', function(){
console.log('Columns Deleted');
temp.delete();
resolve();
})
);
});
}
exports.triggerRename = async(data, context) => {
var DMS = 'Triple';
var file = data;
//if not a temporary file
if(!file.name.includes('temp')){
//create a new File object from the name of the data passed
const originalFile = bucket.file(file.name);
//identify which database this data is from
DMSs.forEach(function(database){
if(file.name.includes(database)){
DMS = database;
}
});
//rename the file
var tempName = await rename(file, originalFile, DMS);
//if it was renamed, delete the extra columns
if (!tempName.includes('Not Renamed')){
await columnDelete(tempName);
}
} else if(file.name.includes('undefined')){
console.log(file.name + ' is invalid. Deleted.');
bucket.file(file.name).delete();
}
else {
console.log( file.name + ' is a temporary file. Did not rename.');
}
};
What I expect to be output is as below:
Deleting Columns...
Columns Deleted
Nice and simple, letting us know when it has started and finished.
However, I get this instead:
Deleting Columns...
ApiError: No such object: <file> at at Object.parseHttpRespMessage(......)
finished with status: 'crash'
Which is not wanted for obvious reasons. My next thought is to make sure that the file hasn't been deleted by another instance of the script midway through, but to do that I would have to check to see if the file is being used by another stream, which is, to my knowledge, not possible.
Any ideas out there?

When I was creating the file I called the asynchronous function copy() and moved on, meaning that when trying to access the file it was not finished copying. Unknown to me, the File Object is a reference variable, and did not actually contain the file itself. While the file was copying, the pointer was present but it was pointing to an unfinished file.
Thus, "No Such Object". To fix this, I simply used a callback to make sure that the copying was finished before I was accessing the file.
Thanks to Doug Stevenson for letting me know about the pointer!

How to wait for the response of all async fs.rename without blocking the process?

I built an Angular/Node app that renames files in network folders. The number of files it renames are between 300 to 500. I use await so I get notified when renaming is done. It takes 8-10minutes per run and it can't rename simultaneously since I am using await.
I need to pass the number of renamed files and I need to show the user that the renaming is already complete. If I don't use async/await, how can my angular front-end know that the renaming is completed?
My full code is in here: https://github.com/ericute/renamer
Here's where I'm having a trouble with:
await walk(folderPath, function(err, results) {
if (err) throw err;
results.forEach(file => {
if (fs.lstatSync(file).isFile) {
fileCounter++;
}
let fileBasename = path.basename(file);
let filePath = path.dirname(file);
if (!filesForRenaming[path.basename(file)]) {
//In a javascript forEach loop,
//return is the equivalent of continue
//https://stackoverflow.com/questions/31399411/go-to-next-iteration-in-javascript-foreach-loop
return;
}
let description = filesForRenaming[path.basename(file)].description;
// Process instances where the absolute file name exceeds 255 characters.
let tempNewName = path.resolve(filePath, description + "_" + fileBasename);
let tempNewNameLength = tempNewName.length;
let newName = '';
if (tempNewNameLength > 255) {
let excess = 254 - tempNewNameLength;
if (description.length > Math.abs(excess)) {
description = description.substring(0, (description.length - Math.abs(excess)));
}
newName = path.resolve(filePath, description + "_" + fileBasename);
} else {
newName = tempNewName;
}
renamedFiles++;
// Actual File Renaming
fs.renameSync(file, newName, (err) => {
if (err) {
errList.push(err);
}
renamedFiles++;
});
});
if (Object.keys(errList).length > 0) {
res.send({"status":"error", "errors": errList});
} else {
res.send({
"status":"success",
"filesFoundInDocData": Object.keys(filesForRenaming).length,
"filesFound": fileCounter,
"renamedFiles": renamedFiles,
"startDate": startDate
});
}
});

If you're using any sync methods you're basically blocking the event loop. You must change the whole structure of your code and start using promises everywhere. You should be able to create another service in angular that checks if the renaming process is completed using timeInterval and GET requests (the most easy way). For example, you could set angular to fetch data from "/isRenameCompleted" and alert the user if the result is true or something. To get real-time results you must switch to socket-io. A quickly solution for 1 client (cause you need to store unique IDs for each request and fetch promises accordingly) could be this one:
1: Create two global variables on top of your code
var filesStatus="waiting"
var pendingFiles=[]
2: Inside your renaming logic route push every file to the promise array using a for loop and start waiting asynchronously for the renaming process to finish
pendingFiles.push(fsPromises.rename(oldName,newName))
Promise.all(pendingFiles)
.then(values => {
filesStatus = "done"
})
.catch(error => {
filesStatus = "error"
});
filesStatus="pending"
3: Now add a new route /isRenameCompleted that will have a report logic like the following
router.get('/isRenameCompleted', (req, res, next) => {
if (filesStatus==="pending"){
res.end("please wait")
} else if (filesStatus==="done"){
res.end("done! your files renamed")
}
}

NodeJS - fs.stat() is ignored in a FOR-loop

I'm trying to loop over a bunch of directories and then try if a file inside that directory exists with NodeJS and fs.stat().
I've got a simple for-loop to loop over the directories and in it the fs.stat() call to check, if "project.xml" inside that particular directory exists. My code looks like this:
for (var i = 0, length = prDirs.length; i < length; i++) {
var path = Config["ProjectDirectory"] + "/" + prDirs[i];
console.log("PATH=" + path);
fs.stat(path + "/project.xml", function (err, stat) {
if (err == null) {
console.log(" => PATH=" + path);
}
})
}
NodeJS loops correctly over the directory, the console.log() outputs all the directories correctly, but the code inside the if inside fs.stat() is not called and runs only once at the end of the loop. My console shows this:
PATH=(...)/PHP
PATH=(...)/Electron
PATH=(...)/testapp
PATH=(...)/Vala
=> PATH=(...)/Vala/project.xml
But the project.xml I'm looking for is in testapp/ not in Vala/ but Vala/ is the last entry in prDirs.
The code above is my latest attempt, I've tried plenty of other variations and one (I appended an else to the if inside fs.stat()) showed me, that fs.stat() actually gets invoked, but only the code inside the if is not running and the code in the else I appended once was running.
Thanks in advance!

fs.stat is an asynchronous i/o function, so its callback will be called only after the main thread is idle, or in your case, only after the for loop is done. Instead of a for loop, I suggest iterating the folder in an asynchronous matter. You can use async.each, async.eachSeries, or implement it yourself.

As #Gil Z mention fs.stat in async. I would suggest to use promises if you want to keep for each loop and make code looks sync.
Here is the example. Works on node v 6.9
"use strict";
const fs = require('fs');
let paths = ["tasks", "aaa", "bbbb"];
//method to get stat data using promises
function checkFileExists(path) {
return new Promise((resolve, reject) => {
fs.stat(path + "/project.xml", (err, res) => {
resolve(err ? "Not found in " + path : " => PATH=" + path);
});
});
}
//create promise array with each directory
let promiseArr = [];
paths.forEach(pathPart => {
let path = process.cwd() + "/" + pathPart;
promiseArr.push(checkFileExists(path));
});
//run all promises and collect results
Promise.all(promiseArr)
.then(reslut => {
console.log(reslut);
})
.catch(e => console.log("Error in promises"));
The above code will log this
[ ' => PATH=/Users/mykolaborysiuk/Sites/circlemedia/syndic-apiManager/tasks',
'Not found in /Users/mykolaborysiuk/Sites/circlemedia/syndic-apiManager/aaa',
'Not found in /Users/mykolaborysiuk/Sites/circlemedia/syndic-apiManager/bbbb' ]
Hope this helps.

Nodejs Read very large file(~10GB), Process line by line then write to other file

I have a 10 GB log file in a particular format, I want to process this file line by line and then write the output to other file after applying some transformations. I am using node for this operation.
Though this method is fine but it takes a hell lot of time to do this. I was able to do this within 30-45 mins in JAVA, but in node it is taking more than 160 minutes to do the same job. Following is the code:
Following is the initiation code which reads each line from the input.
var path = '../10GB_input_file.txt';
var output_file = '../output.txt';
function fileopsmain(){
fs.exists(output_file, function(exists){
if(exists) {
fs.unlink(output_file, function (err) {
if (err) throw err;
console.log('successfully deleted ' + output_file);
});
}
});
new lazy(fs.createReadStream(path, {bufferSize: 128 * 4096}))
.lines
.forEach(function(line){
var line_arr = line.toString().split(';');
perform_line_ops(line_arr, line_arr[6], line_arr[7], line_arr[10]);
}
);
}
This is the method that performs some operation over that line and
passes the input to write method to write it into the output file.
function perform_line_ops(line_arr, range_start, range_end, daynums){
var _new_lines = '';
for(var i=0; i<days; i++){
//perform some operation to modify line pass it to print
}
write_line_ops(_new_lines);
}
Following method is used to write data into a new file.
function write_line_ops(line) {
if(line != null && line != ''){
fs.appendFileSync(output_file, line);
}
}
I want to bring this time down to 15-20 mins. Is it possible to do so.
Also for the record I'm trying this on a intel i7 processor with 8 GB of RAM.

You can do this easily without a module. For example:
var fs = require('fs');
var inspect = require('util').inspect;
var buffer = '';
var rs = fs.createReadStream('foo.log');
rs.on('data', function(chunk) {
var lines = (buffer + chunk).split(/\r?\n/g);
buffer = lines.pop();
for (var i = 0; i < lines.length; ++i) {
// do something with `lines[i]`
console.log('found line: ' + inspect(lines[i]));
}
});
rs.on('end', function() {
// optionally process `buffer` here if you want to treat leftover data without
// a newline as a "line"
console.log('ended on non-empty buffer: ' + inspect(buffer));
});

I can't guess where the possible bottleneck is in your code.
Can you add the library or the source code of the lazy function?
How many operations does your perform_line_ops do? (if/else, switch/case, function calls)
I've created a example based on your given code, I know that this does not answer your question but maybe helps you understand how node handles such case.
const fs = require('fs')
const path = require('path')
const inputFile = path.resolve(__dirname, '../input_file.txt')
const outputFile = path.resolve(__dirname, '../output_file.txt')
function bootstrap() {
// fs.exists is deprecated
// check if output file exists
// https://nodejs.org/api/fs.html#fs_fs_exists_path_callback
fs.exists(outputFile, (exists) => {
if (exists) {
// output file exists, delete it
// https://nodejs.org/api/fs.html#fs_fs_unlink_path_callback
fs.unlink(outputFile, (err) => {
if (err) {
throw err
}
console.info(`successfully deleted: ${outputFile}`)
checkInputFile()
})
} else {
// output file doesn't exist, move on
checkInputFile()
}
})
}
function checkInputFile() {
// check if input file can be read
// https://nodejs.org/api/fs.html#fs_fs_access_path_mode_callback
fs.access(inputFile, fs.constants.R_OK, (err) => {
if (err) {
// file can't be read, throw error
throw err
}
// file can be read, move on
loadInputFile()
})
}
function saveToOutput() {
// create write stream
// https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options
const stream = fs.createWriteStream(outputFile, {
flags: 'w'
})
// return wrapper function which simply writes data into the stream
return (data) => {
// check if the stream is writable
if (stream.writable) {
if (data === null) {
stream.end()
} else if (data instanceof Array) {
stream.write(data.join('\n'))
} else {
stream.write(data)
}
}
}
}
function parseLine(line, respond) {
respond([line])
}
function loadInputFile() {
// create write stream
const saveOutput = saveToOutput()
// create read stream
// https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options
const stream = fs.createReadStream(inputFile, {
autoClose: true,
encoding: 'utf8',
flags: 'r'
})
let buffer = null
stream.on('data', (chunk) => {
// append the buffer to the current chunk
const lines = (buffer !== null)
? (buffer + chunk).split('\n')
: chunk.split('\n')
const lineLength = lines.length
let lineIndex = -1
// save last line for later (last line can be incomplete)
buffer = lines[lineLength - 1]
// loop trough all lines
// but don't include the last line
while (++lineIndex < lineLength - 1) {
parseLine(lines[lineIndex], saveOutput)
}
})
stream.on('end', () => {
if (buffer !== null && buffer.length > 0) {
// parse the last line
parseLine(buffer, saveOutput)
}
// Passing null signals the end of the stream (EOF)
saveOutput(null)
})
}
// kick off the parsing process
bootstrap()

I know this is old but...
At a guess appendFileSync() _write()_s to the file system and waits for the response. Lots of small writes are generally expensive, presuming you use a BufferedWriter in Java you might get faster results by skipping some write()s.
Use one of the async writes and see if node buffers sensibly, or write the lines to large node Buffer until it is full and always write a full (or nearly full) Buffer. By tuning the buffer size you could validate if the number of writes affects perf. I suspect it would.

The execution is slow, because you're not using node's asynchronous operations. In essence, you're executing the code like this:
> read some lines
> transform
> write some lines
> repeat
While you could be doing everything at once, or at least reading and writing. Some examples in the answers here do that, but the syntax is at least complicated. Using scramjet you can do it in a couple simple lines:
const {StringStream} = require('scramjet');
fs.createReadStream(path, {bufferSize: 128 * 4096})
.pipe(new StringStream({maxParallel: 128}) // I assume this is an utf-8 file
.split("\n") // split per line
.parse((line) => line.split(';')) // parse line
.map([line_arr, range_start, range_end, daynums] => {
return simplyReturnYourResultForTheOtherFileHere(
line_arr, range_start, range_end, daynums
); // run your code, return promise if you're doing some async work
})
.stringify((result) => result.toString())
.pipe(fs.createWriteStream)
.on("finish", () => console.log("done"))
.on("error", (e) => console.log("error"))
This will probably run much faster.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to use promises correctly with multiple piped writestreams - node.js

Related

Function call inside loop gets called after the loop ends

GCF "No Such Object" when the Object in question was just created

How to wait for the response of all async fs.rename without blocking the process?

NodeJS - fs.stat() is ignored in a FOR-loop

Nodejs Read very large file(~10GB), Process line by line then write to other file

Categories

Resources