nodejs express fs iterating files into array or object failing - node.js

So Im trying to use the nodejs express FS module to iterate a directory in my app, store each filename in an array, which I can pass to my express view and iterate through the list, but Im struggling to do so. When I do a console.log within the files.forEach function loop, its printing the filename just fine, but as soon as I try to do anything such as:
var myfiles = [];
var fs = require('fs');
fs.readdir('./myfiles/', function (err, files) { if (err) throw err;
files.forEach( function (file) {
myfiles.push(file);
});
});
console.log(myfiles);
it fails, just logs an empty object. So Im not sure exactly what is going on, I think it has to do with callback functions, but if someone could walk me through what Im doing wrong, and why its not working, (and how to make it work), it would be much appreciated.

The myfiles array is empty because the callback hasn't been called before you call console.log().
You'll need to do something like:
var fs = require('fs');
fs.readdir('./myfiles/',function(err,files){
if(err) throw err;
files.forEach(function(file){
// do something with each file HERE!
});
});
// because trying to do something with files here won't work because
// the callback hasn't fired yet.
Remember, everything in node happens at the same time, in the sense that, unless you're doing your processing inside your callbacks, you cannot guarantee asynchronous functions have completed yet.
One way around this problem for you would be to use an EventEmitter:
var fs=require('fs'),
EventEmitter=require('events').EventEmitter,
filesEE=new EventEmitter(),
myfiles=[];
// this event will be called when all files have been added to myfiles
filesEE.on('files_ready',function(){
console.dir(myfiles);
});
// read all files from current directory
fs.readdir('.',function(err,files){
if(err) throw err;
files.forEach(function(file){
myfiles.push(file);
});
filesEE.emit('files_ready'); // trigger files_ready event
});

As several have mentioned, you are using an async method, so you have a nondeterministic execution path.
However, there is an easy way around this. Simply use the Sync version of the method:
var myfiles = [];
var fs = require('fs');
var arrayOfFiles = fs.readdirSync('./myfiles/');
//Yes, the following is not super-smart, but you might want to process the files. This is how:
arrayOfFiles.forEach( function (file) {
myfiles.push(file);
});
console.log(myfiles);
That should work as you want. However, using sync statements is not good, so you should not do it unless it is vitally important for it to be sync.
Read more here: fs.readdirSync

fs.readdir is asynchronous (as with many operations in node.js). This means that the console.log line is going to run before readdir has a chance to call the function passed to it.
You need to either:
Put the console.log line within the callback function given to readdir, i.e:
fs.readdir('./myfiles/', function (err, files) { if (err) throw err;
files.forEach( function (file) {
myfiles.push(file);
});
console.log(myfiles);
});
Or simply perform some action with each file inside the forEach.

I think it has to do with callback functions,
Exactly.
fs.readdir makes an asynchronous request to the file system for that information, and calls the callback at some later time with the results.
So function (err, files) { ... } doesn't run immediately, but console.log(myfiles) does.
At some later point in time, myfiles will contain the desired information.
You should note BTW that files is already an Array, so there is really no point in manually appending each element to some other blank array. If the idea is to put together the results from several calls, then use .concat; if you just want to get the data once, then you can just assign myfiles = files directly.
Overall, you really ought to read up on "Continuation-passing style".

I faced the same problem, and basing on answers given in this post I've solved it with Promises, that seem to be of perfect use in this situation:
router.get('/', (req, res) => {
var viewBag = {}; // It's just my little habit from .NET MVC ;)
var readFiles = new Promise((resolve, reject) => {
fs.readdir('./myfiles/',(err,files) => {
if(err) {
reject(err);
} else {
resolve(files);
}
});
});
// showcase just in case you will need to implement more async operations before route will response
var anotherPromise = new Promise((resolve, reject) => {
doAsyncStuff((err, anotherResult) => {
if(err) {
reject(err);
} else {
resolve(anotherResult);
}
});
});
Promise.all([readFiles, anotherPromise]).then((values) => {
viewBag.files = values[0];
viewBag.otherStuff = values[1];
console.log(viewBag.files); // logs e.g. [ 'file.txt' ]
res.render('your_view', viewBag);
}).catch((errors) => {
res.render('your_view',{errors:errors}); // you can use 'errors' property to render errors in view or implement different error handling schema
});
});
Note: you don't have to push found files into new array because you already get an array from fs.readdir()'c callback. According to node docs:
The callback gets two arguments (err, files) where files is an array
of the names of the files in the directory excluding '.' and '..'.
I belive this is very elegant and handy solution, and most of all - it doesn't require you to bring in and handle new modules to your script.

Related

Redis Node - Get from hash - Not inserting into array

My goal is to insert the values gotten from a redis hash. I am using the redis package for node js.
My code is the following:
getFromHash(ids) {
const resultArray = [];
ids.forEach((id) => {
common.redisMaster.hget('mykey', id, (err, res) => {
resultArray.push(res);
});
});
console.log(resultArray);
},
The array logged at the end of the function is empty and res is not empty. What could i do to fill this array please ?
You need to use some control flow, either the async library or Promises (as described in reds docs)
Put your console.log inside the callback when the results return from the redis call. Then you will see more print out. Use one of the control flow patterns for your .forEach as well, as that is currently synchronous.
If you modify your code to something like this, it will work nicely:
var getFromHash = function getFromHash(ids) {
const resultArray = [];
ids.forEach((id) => {
common.redisMaster.hget('mykey', id, (err, res) => {
resultArray.push(res);
if (resultArray.length === ids.length) {
// All done.
console.log('getFromHash complete: ', resultArray);
}
});
});
};
In your original code you're printing the result array before any of the hget calls have returned.
Another approach will be to create an array of promises and then do a Promise.all on it.
You'll see this kind of behavior a lot with Node, remember it uses asynchronous calls for almost all i/o. When you're coming from a language where most function calls are synchronous you get tripped up by this kind of problem a lot!

Write file and usage nodejs express

For a school project I'm creating a portal for KVM using NodeJS and Express.
I need to adjust an XML file and then use that XML File to create an VM.
So i created 2 functions
CreateXML:
function createXML(req, res, next) {
var parses = new xml2js.Parser();
fs.readFile('Debian7.xml', function(err, data){
parser.parseString(data, function (err, result){
result.domain.name = req.body.name;
result.domain.memory[0]['$'].unit = "GB";
result.domain.memory[0]['_'] = req.body.ram;
result.domain.currentMemory[0]['$'].unit = "GB";
result.domain.currentMemory[0]['_'] = req.body.ram;
result.domain.vcpu = req.body.cpus;
var builder = new xml2js.Builder({headless: true});
var xml = builder.buildObject(result);
fs.writeFile('./xmlfiles/' + req.body.name + '.xml', xml, function(err, data){
if(err) console.log(err);
});
});
});
};
CreateDomain:
function createDomain(req, res){
var domainXML = fs.readFileSync('./xmlfiles/' + req.body.name + '.xml', 'utf8');
hypervisor.connect(function(){
hypervisor.createDomainAsync(domainXML).then(function (domain){
console.log('Domain Created');
res.json({success: true, msg: 'succesfully created domain'})
});
});
}
then I call these functions as middleware in my post request
apiRoutes.post('/domainCreate', createXML, createDomain);
But then when I use Postman on the api route I get the following error:
Error: ENOENT: no such file or directory, open './xmlfiles/rickyderouter23.xml'
After the error it still creates the XML file and when I create the XML file before I use postman it works fine. It's like it needs to execute both functions before the creation of the XML file, how do I create the XML file after the first function and then use it in the second function.
The answer is "it's asynchronous" (just like many, many problems in node.js/javascript).
The fs.readFile function is asynchronous: when you call it, you give it a callback function which it will call when it finishes loading the file.
The parser.parseString is asynchronous - it will call your callback function when it finishes parsing the XML.
The fs.writeFile is the same - it will call your callback function when it finishes writing the file.
The hypervisor.connect function is the same - it will call your callback function when it finishes connecting.
The middleware functions are called in order, but they both contain code that may not have completed before they return. So when your code calls createDomain and tries to read the XML file created in createXML, the XML file probably doesn't exist yet. The fs.readFile might not be finished yet; even if it is, the parser.parseString function might not be finished yet; even if that one is finished, the fs.writeFile might not be finished yet.
One way to solve this would be to put the functionality of the createXML and createDomain functions together into one middleware function. That would allow you to rewrite it so that all the function calls that depend on previous asynchronous function calls could actually wait for those calls to complete before executing. A simple way to do it would be this:
function createXML(req, res, next) {
var parses = new xml2js.Parser();
fs.readFile('Debian7.xml', function(err, data){
parser.parseString(data, function (err, result){
result.domain.name = req.body.name;
result.domain.memory[0]['$'].unit = "GB";
result.domain.memory[0]['_'] = req.body.ram;
result.domain.currentMemory[0]['$'].unit = "GB";
result.domain.currentMemory[0]['_'] = req.body.ram;
result.domain.vcpu = req.body.cpus;
var builder = new xml2js.Builder({headless: true});
var xml = builder.buildObject(result);
fs.writeFile('./xmlfiles/' + req.body.name + '.xml', xml, function(err, data){
if(err) console.log(err);
// notice the call to createDomain here - this ensure
// that the connection to the hypervisor is not started
// until the file is written
createDomain(req, res);
});
});
});
};
And change your route to:
apiRoutes.post('/domainCreate', createXML);
Now, that's pretty ugly. I don't like the idea of lumping those two middleware functions into one and I'd prefer to rewrite it to use a promise-based approach, but that's the basic the idea.

nodeJS too many child processes?

I am using node to recursively traverse a file system and make a system call for each file, by using child.exec. It works well when tested on a small structure, with a couple of folders and files, but when run on the whole home directory, it crashes after a while
child_process.js:945
throw errnoException(process._errno, 'spawn');
^
Error: spawn Unknown system errno 23
at errnoException (child_process.js:998:11)
at ChildProcess.spawn (child_process.js:945:11)
at exports.spawn (child_process.js:733:9)
at Object.exports.execFile (child_process.js:617:15)
at exports.exec (child_process.js:588:18)
Does this happen because it uses up all resources? How can I avoid this?
EDIT: Code
improvement and best practices suggestions always welcome :)
function processDir(dir, callback) {
fs.readdir(dir, function (err, files) {
if (err) {...}
if (files) {
async.each(files, function (file, cb) {
var filePath = dir + "/" + file;
var stats = fs.statSync(filePath);
if (stats) {
if (stats.isFile()) {
processFile(dir, file, function (err) {
if (err) {...}
cb();
});
} else if (stats.isDirectory()) {
processDir(filePath, function (err) {
if (err) {...}
cb();
});
}
}
}, function (err) {
if (err) {...}
callback();
}
);
}
});
}
the issue can be because of having many open files simultaneously
consider using async module to solve the issue
https://github.com/caolan/async#eachLimit
async.eachLimit(
files,
20,
function(file, callback){
//process file here and call callback
},
function(err){
//done
}
);
in current example you will process 20 files at a time
Well, I don't know the reason for the failure, but if this is what you expect (using up all of the resources) or as others say (too many files open), you could try to use multitasking for it. JXcore (fork of Node.JS) offers such thing - it allows to run a task in a separate instance, but this is done still inside one single process.
While Node.JS app as a process has its limitations - JXcore with its sub-instances multiplies those limits: single process even with one extra instance (or task, or well: we can call it sub-thread) doubles the limits!
So, let's say, that you will run each of your spawn() in a separate task. Or, since tasks are not running in a main thread any more - you can even use sync method that jxcore offers : cmdSync().
Probably the the best illustration would be given by this few lines of the code:
jxcore.tasks.setThreadCount(4);
var task = function(file) {
var your_cmd = "do something with " + file;
return jxcore.utils.cmdSync(your_cmd);
};
jxcore.tasks.addTask(task, "file1.txt", function(ret) {
console.log("the exit code:", ret.exitCode);
console.log("output:", ret.out);
});
Let me repeat: the task will not block the main thread, since it is running in a separate instance!
Multitasking API is documented here: Multitasking.
As has been established in comments, you are likely running out of file handles because you are running too many concurrent operations on your files. So, a solution is to limit the number of concurrent operations that run at once so too many files aren't in use at the same time.
Here's a somewhat different implementation that uses Bluebird promises to control both the async aspects of the operation and the concurrency aspects of the operation.
To make the management of the concurrency aspect easier, this collects the entire list of files into an array first and then processes the array of filenames rather than processing as you go. This makes it easier to use a built-in concurrency capability in Bluebird's .map() (which works on a single array) so we don't have to write that code ourselves:
var Promise = require("bluebird");
var fs = Promise.promisifyAll(require("fs"));
var path = require("path");
// recurse a directory, call a callback on each file (that returns a promise)
// run a max of numConcurrent callbacks at once
// returns a promise for when all work is done
function processDir(dir, numConcurrent, fileCallback) {
var allFiles = [];
function listDir(dir, list) {
var dirs = [];
return fs.readdirAsync(dir).map(function(file) {
var filePath = path.join(dir , file);
return fs.statAsync(filePath).then(function(stats) {
if (stats.isFile()) {
allFiles.push(filePath);
} else if (stats.isDirectory()) {
return listDir(filePath);
}
}).catch(function() {
// ignore errors on .stat - file could just be gone now
return;
});
});
}
return listDir(dir, allFiles).then(function() {
return Promise.map(allFiles, function(filename) {
return fileCallback(filename);
}, {concurrency: numConcurrent});
});
}
// example usage:
// pass the initial directory,
// the number of concurrent operations allowed at once
// and a callback function (that returns a promise) to process each file
processDir(process.cwd(), 5, function(file) {
// put your own code here to process each file
// this is code to cause each callback to take a random amount of time
// for testing purposes
var rand = Math.floor(Math.random() * 500) + 500;
return Promise.delay(rand).then(function() {
console.log(file);
});
}).catch(function(e) {
// error here
}).finally(function() {
console.log("done");
});
FYI, I think you'll find that proper error propagation and proper error handling from many async operations is much, much easier with promises than the plain callback method.

node.js file system problems

I keep banging my head against the wall because of tons of different errors. This is what the code i try to use :
fs.readFile("balance.txt", function (err, data) //At the beginning of the script (checked, it works)
{
if (err) throw err;
balance=JSON.parse(data);;
});
fs.readFile("pick.txt", function (err, data)
{
if (err) throw err;
pick=JSON.parse(data);;
});
/*....
.... balance and pick are modified
....*/
if (shutdown)
{
fs.writeFile("balance2.txt", JSON.stringify(balance));
fs.writeFile("pick2.txt", JSON.stringify(pick));
process.exit(0);
}
At the end of the script, the files have not been modified the slightest. I then found out on this site that the files were being opened 2 times simultaneously, or something like that, so i tried this :
var balance, pick;
var stream = fs.createReadStream("balance.txt");
stream.on("readable", function()
{
balance = JSON.parse(stream.read());
});
var stream2 = fs.createReadStream("pick.txt");
stream2.on("readable", function()
{
pick = JSON.parse(stream2.read());
});
/****
****/
fs.unlink("pick.txt");
fs.unlink("balance.txt");
var stream = fs.createWriteStream("balance.txt", {flags: 'w'});
var stream2 = fs.createWriteStream("pick.txt", {flags: 'w'});
stream.write(JSON.stringify(balance));
stream2.write(JSON.stringify(pick));
process.exit(0);
But, this time, both files are empty... I know i should catch errors, but i just don't see where the problem is. I don't mind storing the 2 objects in the same file, if that can helps. Besides that, I never did any javascript in my life before yesterday, so, please give me a simple explanation if you know what failed here.
What I think you want to do is use readFileSync and not use readFile to read your files since you need them to be read before doing anything else in your program (http://nodejs.org/api/fs.html#fs_fs_readfilesync_filename_options).
This will make sure you have read both the files before you execute any of the rest of your code.
Make your like code do this:
try
{
balance = JSON.parse(fs.readFileSync("balance.txt"));
pick = JSON.parse(fs.readFileSync("pick.txt"));
}
catch(err)
{ throw err; }
I think you will get the functionality you are looking for by doing this.
Note, you will not be able to check for an error in the same way you can with readFile. Instead you will need to wrap each call in a try catch or use existsSync before each operation to make sure you aren't trying to read a file that doesn't exist.
How to capture no file for fs.readFileSync()?
Furthermore, you have the same problem on the writes. You are kicking off async writes and then immediately calling process.exit(0). A better way to do this would be to either write them sequentially asynchronously and then exit or to write them sequentially synchronously then exit.
Async option:
if (shutdown)
{
fs.writeFile("balance2.txt", JSON.stringify(balance), function(err){
fs.writeFile("pick2.txt", JSON.stringify(pick), function(err){
process.exit(0);
});
});
}
Sync option:
if (shutdown)
{
fs.writeFileSync("balance2.txt", JSON.stringify(balance));
fs.writeFileSync("pick2.txt", JSON.stringify(pick));
process.exit(0);
}

How to wait for all async calls to finish

I'm using Mongoose with Node.js and have the following code that will call the callback after all the save() calls has finished. However, I feel that this is a very dirty way of doing it and would like to see the proper way to get this done.
function setup(callback) {
// Clear the DB and load fixtures
Account.remove({}, addFixtureData);
function addFixtureData() {
// Load the fixtures
fs.readFile('./fixtures/account.json', 'utf8', function(err, data) {
if (err) { throw err; }
var jsonData = JSON.parse(data);
var count = 0;
jsonData.forEach(function(json) {
count++;
var account = new Account(json);
account.save(function(err) {
if (err) { throw err; }
if (--count == 0 && callback) callback();
});
});
});
}
}
You can clean up the code a bit by using a library like async or Step.
Also, I've written a small module that handles loading fixtures for you, so you just do:
var fixtures = require('./mongoose-fixtures');
fixtures.load('./fixtures/account.json', function(err) {
//Fixtures loaded, you're ready to go
};
Github:
https://github.com/powmedia/mongoose-fixtures
It will also load a directory of fixture files, or objects.
I did a talk about common asyncronous patterns (serial and parallel) and ways to solve them:
https://github.com/masylum/i-love-async
I hope its useful.
I've recently created simpler abstraction called wait.for to call async functions in sync mode (based on Fibers). It's at an early stage but works. It is at:
https://github.com/luciotato/waitfor
Using wait.for, you can call any standard nodejs async function, as if it were a sync function, without blocking node's event loop. You can code sequentially when you need it.
using wait.for your code will be:
//in a fiber
function setup(callback) {
// Clear the DB and load fixtures
wait.for(Account.remove,{});
// Load the fixtures
var data = wait.for(fs.readFile,'./fixtures/account.json', 'utf8');
var jsonData = JSON.parse(data);
jsonData.forEach(function(json) {
var account = new Account(json);
wait.forMethod(account,'save');
}
callback();
}
That's actually the proper way of doing it, more or less. What you're doing there is a parallel loop. You can abstract it into it's own "async parallel foreach" function if you want (and many do), but that's really the only way of doing a parallel loop.
Depending on what you intended, one thing that could be done differently is the error handling. Because you're throwing, if there's a single error, that callback will never get executed (count won't be decremented). So it might be better to do:
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
And handle the error in the callback. It's better node-convention-wise.
I would also change another thing to save you the trouble of incrementing count on every iteration:
var jsonData = JSON.parse(data)
, count = jsonData.length;
jsonData.forEach(function(json) {
var account = new Account(json);
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
});
If you are already using underscore.js anywhere in your project, you can leverage the after method. You need to know how many async calls will be out there in advance, but aside from that it's a pretty elegant solution.

Resources