NODE fs.readFile, JSON.parse and fs.writeFile

NODE fs.readFile, JSON.parse and fs.writeFile - node.js

I'm writing an app in Node and have been running into a rare but detrimental occurrence.
So I have a schedule.txt and I write to it when the user makes a change but then also read it every second and then parse it for use throughout the program.
Rarely what happens is as a user is writing to the file (asynchronously) the app (based on the timer) reads the same file and attempts to parse it and fails.
I know from a design stand-point maybe this is just bound to happen... but I'm wondering if there is a quick fix I can do now. Would using writeFileSync help my situation? (make it more 'atomic'?) I just want to make sure that the app doesn't read the file while another process is still writing to the file.
TIA!
Niko

Seems like you'd want to serialize your read/writes. If it were me, I might try having a "manager" object which encapsulates the serialization, which you'd use like:
var fileManager = require('./file-manager');
// somewhere in the program
fileManager.scheduleWrite(data, function(err){
// now the write is done
});
// somewhere else in the program
fileManager.scheduleRead(function(err, data){
// `data` contains the data
});
Then implement it using Q or a similar promises lib, like:
// in file-manager.js
var wait = Q();
module.exports = {
scheduleWrite: function(data, cb){
wait = wait.then(function(){
// write data and call cb()
});
},
scheduleRead: function(){
wait = wait.then(function(){
// read data and call cb(data)
});
}
};
The wait var will "stack up" into a serialized chain of tasks where the next one won't start until the previous one completes.

Related

Async node file creation

I'm trying to check if a file exists, and if it doesn't, create the file.
self.checkFeedbackFile = function() {
// attempt to read the file - if it does not exist, create the file
var feedbackFile = fs.readFile('feedback.log', function (err, data) {
console.log("Checking that the file exists.");
});
if (feedbackFile === undefined) {
console.log("File does not exist. Creating a new file...");
}
}
I'm obviously very new to node. Been working in Ruby for a while, and I only have a little bit of experience in Javascript, so the concept of callbacks and async execution is quite foreign to me.
Right now my console is returning the following:
File does not exist. Creating a new file...
Sat Sep 29 2018 12:59:12 GMT-0400 (Eastern Daylight Time): Node server started on 127.0.0.1:3333 ...
Checking that the file exists.
In addition to not being sure how to do this, what is the ELI5 explanation for why console logs are printing out of order?

In your case the fs.readFile() method is called. It waits for the io to complete. However, the checkFeedbackFile() method continues to the if statement.
Would recommend that you use fs.stat to check if the file exists.
And fs.writeFileSync to write to the file a sync way.
self.checkFeedbackFile = function() {
// attempt to read the file - if it does not exist, create the file
fs.stat('feedback.log', function(err, data){
if(err){
console.log("File doesnt exist, creating a new file");
//Do Something
fs.writeFileSync('feedback.log',data);
}
}
}
Node.js is asycn, if you are coming in from C or Java, you are used to the this:
function main(){
1();
2();
3();
}
In C or Java the control will move to 2() only when 1() is finished. That is not the case with Node depending on what 1() is doing, if its doing anything in an async way, say IO, then 2() will be executed before 1() completes, and hence you see async methods taking a callback, which will be executed once the relevant function completes.
Would recommend taking a look at how Nodes Event loop works.

Ok. In your main function you have two console.logs.
console.log("Checking that the file exists."); is inside a callback. e
But the
console.log("File does not exist. Creating a new file...");
is just within a if block. so it's fired first

Because the console.log("Checking that the file exists."); code is dependent of the call of the readfile function. you wrap it in the callback give as a second argument to the readfile. one the operation of reading the file is completed the callback will be triggered with the result. and all other code which are at the same level as the readfile function will be execute as if the readfile function have already finish it execution. the call of readfile will not block the execution of all other code that come after that because you have provide some callback which will be execute when the opration is completed.
This behavior is diferent of the behavior of synchronous programming.
console.log('first');
console.log('second');
setTimeout(function(){
console.log('third');
}, 2000);
console.log('Fourth');
In the code provide above in the synchronous programming the execution will go line after line. To log the third text. the execution will wait 2 seconds but in non blocking programming (asynchronous) the Fourth text will be print before the execution of the console.log('third');

When calling Edge.js from C#, how do you hook stdout and stderr?

Background
I am working on a C# program which currently runs Node via Process.Start(). I am capturing the stdout and stderr from this child process and redirecting it for my own reasons. I am looking into replacing the invocation of Node.exe with a call to Edge.js instead. In order to be able to do this I must be able to reliably capture stdout and stderr from the Javascript running within Edge, and get the messages back into my C# application.
Approach 1
I'll describe this approach for completeness in case anybody recommends it :)
If the Edge process terminates, it is fairly easy to deal with this by simply declaring a msgs array and overwriting process.stdout.write and process.stderr.write with new functions that accumulate messages on that array, then at the end, simply return the msgs array. Example:
var msgs = [];
process.stdout.write = function (string) {
msgs.push({ stream: 'o', message : string });
};
process.stderr.write = function (string) {
msgs.push({ stream: 'e', message: string });
};
// Return to caller.
var result = { messages: msgs; ...other stuff... };
callback(null, result);
Obviously this only works if the Edge code terminates, and msgs may grow large in the worst case. However, it is likely to perform well because only one marshalling call is necessary to get all the messages back.
Approach 2
This is a little harder to explain. Instead of accumulating messages, we "hook" stdout and stderr using a delegate we send in from C#. In the C#, we create an object that we will pass into Edge, and that object has a property called stdoutHook:
dynamic payload = new ExpandoObject();
payload.stdoutHook = GetStdoutHook();
public Func<object, Task<object>> GetStdoutHook()
{
Func<object, Task<object>> hook = (message) =>
{
TheLogger.LogMessage((message as string).Trim());
return Task.FromResult<object>(null);
};
return hook;
}
I could really get away with an Action, but Edge appears to require the Func<object, Task<object>>, it won't proxy the function otherwise. Then, in the Javascript, we can detect that function and use it like this:
var func = Edge.Func(#"
return function(payload, callback) {
if (typeof (payload.stdoutHook) === 'function') {
process.stdout.write = payload.stdoutHook;
}
// do lots of stuff while stdout and stderr are hooked...
var what = require('whatever');
what.futz();
// terminate.
callback(null, result);
}");
dynamic result = func(payload).Result;
Questions
Q1. Both of these techniques seem to work, but is there a better way of doing this, something built-in to Edge perhaps that I have missed? Both solutions are invasive - they require some shim code to wrap the actual work that is to be done in Edge. This is not the end of the world, but it would be better if there was a non-invasive method.
Q2. In approach 2, where I have to return a task here
return Task.FromResult<object>(null);
it feels wrong to be returning an already completed "null task". But is there another way of writing this?
Q3. Do I need to be more rigorous in the Javascript code when hooking stdout and stderr? I note in double-edge.js there is this code, frankly I am not sure what is happening here, but it is quite a bit more complex than my crude overwriting of process.stdout.write :-)
// Fix #176 for GUI applications on Windows
try {
var stdout = process.stdout;
}
catch (e) {
// This is a Windows GUI application without stdout and stderr defined.
// Define process.stdout and process.stderr so that all output is discarded.
(function () {
var stream = require('stream');
var NullStream = function (o) {
stream.Writable.call(this);
this._write = function (c, e, cb) { cb && cb(); };
}
require('util').inherits(NullStream, stream.Writable);
var nullStream = new NullStream();
process.__defineGetter__('stdout', function () { return nullStream; });
process.__defineGetter__('stderr', function () { return nullStream; });
})();
}

Q1: There isn't anything built into Edge that would make capturing stdout or stderr of Node.js code automatic when calling Node from CLR. At some point I thought of writing an extension of Edge that would make marshaling Streams across CLR/V8 boundary easy. Under the hood it would be very similar to your Approach 2. It could be done as a standalone module on top of Edge.
Q2: Returning a completed task is very appropriate in this case. Your function has captured the Node.js output, processed it, and has in fact "completed" in that sense. Returning a task completed with Null is really a moral equivalent of returning from an Action.
Q3: The code you are pointing to is only relevant in Windows GUI applications, not Console applications. If you are writing a Console application, simply overriding write should suffice at the level of the Node.js code you pass to Edge.js. Note that the signature of write in Node allows an optional encoding parameter to be passed in. You seem to ignore it both in Approach 1 and 2. In particular in Approach 2 I would suggest wrapping the JavaScript proxy to C# callback into a JavaScript function that normalizes the parameters before assigning it to process.stdout.write. Otherwise Edge.js code may assume that the encoding parameter passed to a write call is a callback function which would follow the Edge.js calling convention.

NodeJS fs API: Detect Asynchronous Completion

I have a NodeJS application which uses the fs API to read files from a directory tree. I'm using the fs-walk module to walk the tree. For every sub directory encountered, the same function executes again to handle it. (I don't think this is recursion; rather, the same function is bound to an event which is fired each time a directory is handled.) Files are handled by a different function, which does stuff to them.
I'd like to execute arbitrary code once all files have been read without using synchronous or blocking code. I couldn't find any way to keep track of the number of files in a directory (to count down, for instance), nor could I find any attribute in fs.stat to indicate that the entire operation has completed.
Had anyone found a way to do this yet? I could find nothing in the node docs or on stack overflow.

After reviewing the fs-walk library a little closer, it looks like the third argument to the walk() method is actually a final callback. Internally they are using the async library, specifically async.whilst() and async.waterfall() methods which will execute the final callback when everything is complete.
I think the intention of the library creator is for that final callback to be executed when all async actions are completed. If that isn't working, you may want to file an issue in Github for it:
According to the code, you should be able to do:
var walk = require('fs-walk';
walk('/some/dir', someFileOrDirHandler, function(err) {
// This should be a final callback, if the first argument is present,
// then there was an error
if (err) {
/* handle it */
return;
}
// Getting here indicates success
});

As a compromise in performance, I ended up doing a total file count using a recursive function that accessed the file system synchronously. Using the total, I then accessed all the files asynchronously, decrementing the total each time. Once the total reached zero, I executed a function to handle all of the completed data.
var countAllFiles = new Promise(function (resolve, reject) {
var total = 0,
count = function (path) {
var contents = fs.readdirSync(path), file, name;
for (file in contents) {
if (!contents.hasOwnProperty(file)) continue;
name = path + '/' + contents[file];
if (fs.statSync(name).isDirectory())
count(name);
else
++total;
}
};
count('/path/to/tree/');
resolve(total);
}).then(function (total) {
walk.dirs('/path/to/tree/', handlerFunction, errorHandler);
// for every file, decrement total. Then, if it's zero, execute the code that
// depends on all the read/write operations being complete
});

Reading file in segments of X number of lines

I have a file with a lot of entries (10+ million), each representing a partial document that is being saved to a mongo database (based on some criteria, non-trivial).
To avoid overloading the database (which is doing other operations at the same time), I wish to read in chunks of X lines, wait for them to finish, read the next X lines, etc.
Is there any way to use any of the fscallback-mechanisms to also "halt" progress at a certain point, without blocking the entire program? From what I can tell they will all run from start to finish with no way of stopping it, unless you stop reading the file entirely.
The issues is that because of the file size, memory also becomes an issue and because of the time the updates take, a LOT of the data will be held in memory exceeding the 1 GB limit and causing the program to crash. Secondarily, as I said, I don't want to queue 1 million updates and completely stress the mongo database.
Any and all suggestions welcome.
UPDATE: Final solution using line-reader (available via npm) below, in pseudo-code.
var lineReader = require('line-reader');
var filename = <wherever you get it from>;
lineReader(filename, function(line, last, cb) {
//
// Do work here, line contains the line data
// last is true if it's the last line in the file
//
function checkProcessed(callback) {
if (doneProcessing()) { // Implement doneProcessing to check whether whatever you are doing is done
callback();
}
else {
setTimeout(function() { checkProcessed(callback) }, 100); // Adjust timeout according to expecting time to process one line
}
}
checkProcessed(cb);
});
This is implemented to make sure doneProcessing() returns true before attempting to work on more lines - this means you can effectively throttle whatever you are doing.

I don't use MongoDB and I'm not an expert in using Lazy, but I think something like below might work or give you some ideas. (note that I have not tested this code)
var fs = require('fs'),
lazy = require('lazy');
var readStream = fs.createReadStream('yourfile.txt');
var file = lazy(readStream)
.lines // ask to read stream line by line
.take(100) // and read 100 lines at a time.
.join(function(onehundredlines){
readStream.pause(); // pause reading the stream
writeToMongoDB(onehundredLines, function(err){
// error checking goes here
// resume the stream 1 second after MongoDB finishes saving.
setTimeout(readStream.resume, 1000);
});
});
}

How to populate mongoose with a large data set

I'm attempting to load a store catalog into MongoDb (2.2.2) using Node.js (0.8.18) and Mongoose (3.5.4) -- all on Windows 7 64bit. The data set contains roughly 12,500 records. Each data record is a JSON string.
My latest attempt looks like this:
var fs = require('fs');
var odir = process.cwd() + '/file_data/output_data/';
var mongoose = require('mongoose');
var Catalog = require('./models').Catalog;
var conn = mongoose.connect('mongodb://127.0.0.1:27017/sc_store');
exports.main = function(callback){
var catalogArray = fs.readFileSync(odir + 'pc-out.json','utf8').split('\n');
var i = 0;
Catalog.remove({}, function(err){
while(i < catalogArray.length){
new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
if(err){
console.log(err);
} else {
i++;
}
});
if(i === catalogArray.length -1) return callback('database populated');
}
});
};
I have had a lot of problems trying to populate the database. Under previous scenarios (and this one), node pegs the processor and eventually runs out of memory. Note that in this scenario, I'm trying to allow Mongoose to save a record, and then iterate to the next record once the record saves.
But the iterator inside of the Mongoose save function never gets incremented. In addition, it never throws any errors. But if I put the iterator (i) outside of the asynchronous call to Mongoose, it will work, provided the number of records that I try to load are not too big (I have successfully loaded 2,000 this way).
So my questions are: Why isn't the iterator inside of the Mongoose save call ever incremented? And, more importantly, what is the best way to load a large data set into MongoDb using Mongoose?
Rob

i is your index to where you're pulling input data from in catalogArray, but you're also trying to use it to keep track of how many have been saved which isn't possible. Try tracking them separately like this:
var i = 0;
var saved = 0;
Catalog.remove({}, function(err){
while(i < catalogArray.length){
new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
saved++;
if(err){
console.log(err);
} else {
if(saved === catalogArray.length) {
return callback('database populated');
}
}
});
i++;
}
});
UPDATE
If you want to add tighter flow control to the process, you can use the async module's forEachLimit function to limit the number of outstanding save operations to whatever you specify. For example, to limit it to one outstanding save at a time:
Catalog.remove({}, function(err){
async.forEachLimit(catalogArray, 1, function (catalog, cb) {
new Catalog(JSON.parse(catalog)).save(function (err, doc) {
if (err) {
console.log(err);
}
cb(err);
});
}, function (err) {
callback('database populated');
});
}

Rob,
The short answer:
You created an infinite loop. You're thinking synchronously and with blocking, Javascript functions asynchronously and without blocking. What you are trying to do is like trying to directly turn the feeling of hunger into a sandwich. You can't. The closest thing is you use the feeling of hunger to motivate you to go to the kitchen and make it. Don't try to make Javascript block. It won't work. Now, learn async.forEachLimit. It will work for what you want to do here.
You should probably review asynchronous design patterns and understand what it means on a deeper level. Callbacks are not simply an alternative to return values. They are fundamentally different in how and when they are executed. Here is a good primer: http://cs.brown.edu/courses/csci1680/f12/handouts/async.pdf
The long answer:
There is an underlying problem here, and that is your lack of understanding of what non-blocking IO and asynchronous means. Im not sure if you are breaking into node development, or this is just a one-off project, but if you do plan to continue using node (or any asynchronous language) then it is worth the time to understand the difference between synchronous and asynchronous design patterns, and what motivations there are for them. So, that is why you have a logic error putting the loop invariant increment inside an asynchronous callback which is creating an infinite loop.
In non-computer science, that means that your increment to i will never occur. The reason is because Javascript executes a single block of code to completion before any asynchronous callbacks are called. So in your code, your loop will run over and over, without i ever incrementing. And, in the background, you are storing the same document in mongo over and over. Each iteration of the loop starts sending document with index 0 to mongo, the callback can't fire until your loop ends, and all other code outside the loop runs to completion. So, the callback queues up. But, your loop runs again since i++ is never executed (remember, the callback is queued until your code finishes), inserting record 0 again, queueing another callback to execute AFTER your loop is complete. This goes on and on until your memory is filled with callbacks waiting to inform your infinite loop that document 0 has been inserted millions of times.
In general, there is no way to make Javascript block without doing something really really bad. For example, something paramount to setting your kitchen on fire to fry some eggs for that sandwich I talked about in the "short answer".
My advice is to take advantage of libs like async. https://github.com/caolan/async JohnnyHK mentioned it here, and he was correct for doing so.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string