atomic write/read of a file in nodejs - node.js

My nodejs application is built around a "project file".
Several modules (by "module", I mean a simple javascript file of my project) of this application need to load, modify and save this file often, via streams (fs.createReadStream, fs.createWriteStream), and since those modules are executed independently from each other, with sometimes an origin from an websocket events (for instance), I need to make the save/load operations of the project file atomic.
It means the following scenario:
moduleA writes the project file
in the same time, and before moduleA has finished to write the file, moduleB wants to read it => ideally, it should wait for the write operation of moduleA (currently, it reads a partially written file and detect an error) before really read the file
Is nodejs able to do this natively or do I have to build a sort of atomic wrapper over my read/write stream system?

There is to my knowledge nothing built in. There are modules such as redis-lock though, that implement a lock mechanism.
If you run on a single non-clustered server you could probably cope with implementing a simple local lock though.

This might give you an idea:
var Fs = require("fs"),
LOCK = require ("os").tmpdir () + '/foo-lock.';
function transLock(id, cb) {
Fs.open(LOCK + id, "wx", function(err, fd) {
if (err) {
// someone else has created the file
// or something went wrong
cb(err);
} else {
Fs.close(fd, function(err) {
// there should be no error here except weird stuff
// like EINTR which must be ignored on Linux
cb();
});
}
});
}
function transUnlock(id) {
Fs.unlink(LOCK + id, function(err) {
if (err) {
// something is wrong and nothing we can do except
// perhaps log something or do some background cleanup
}
});
}
function main() {
var id = "some-unique-name";
transLock(id, function(err) {
if (err)
console.log(err);
else {
// ... do your stuffs ...
transUnlock(id);
}
});
}
main();

Related

nodeJS too many child processes?

I am using node to recursively traverse a file system and make a system call for each file, by using child.exec. It works well when tested on a small structure, with a couple of folders and files, but when run on the whole home directory, it crashes after a while
child_process.js:945
throw errnoException(process._errno, 'spawn');
^
Error: spawn Unknown system errno 23
at errnoException (child_process.js:998:11)
at ChildProcess.spawn (child_process.js:945:11)
at exports.spawn (child_process.js:733:9)
at Object.exports.execFile (child_process.js:617:15)
at exports.exec (child_process.js:588:18)
Does this happen because it uses up all resources? How can I avoid this?
EDIT: Code
improvement and best practices suggestions always welcome :)
function processDir(dir, callback) {
fs.readdir(dir, function (err, files) {
if (err) {...}
if (files) {
async.each(files, function (file, cb) {
var filePath = dir + "/" + file;
var stats = fs.statSync(filePath);
if (stats) {
if (stats.isFile()) {
processFile(dir, file, function (err) {
if (err) {...}
cb();
});
} else if (stats.isDirectory()) {
processDir(filePath, function (err) {
if (err) {...}
cb();
});
}
}
}, function (err) {
if (err) {...}
callback();
}
);
}
});
}
the issue can be because of having many open files simultaneously
consider using async module to solve the issue
https://github.com/caolan/async#eachLimit
async.eachLimit(
files,
20,
function(file, callback){
//process file here and call callback
},
function(err){
//done
}
);
in current example you will process 20 files at a time
Well, I don't know the reason for the failure, but if this is what you expect (using up all of the resources) or as others say (too many files open), you could try to use multitasking for it. JXcore (fork of Node.JS) offers such thing - it allows to run a task in a separate instance, but this is done still inside one single process.
While Node.JS app as a process has its limitations - JXcore with its sub-instances multiplies those limits: single process even with one extra instance (or task, or well: we can call it sub-thread) doubles the limits!
So, let's say, that you will run each of your spawn() in a separate task. Or, since tasks are not running in a main thread any more - you can even use sync method that jxcore offers : cmdSync().
Probably the the best illustration would be given by this few lines of the code:
jxcore.tasks.setThreadCount(4);
var task = function(file) {
var your_cmd = "do something with " + file;
return jxcore.utils.cmdSync(your_cmd);
};
jxcore.tasks.addTask(task, "file1.txt", function(ret) {
console.log("the exit code:", ret.exitCode);
console.log("output:", ret.out);
});
Let me repeat: the task will not block the main thread, since it is running in a separate instance!
Multitasking API is documented here: Multitasking.
As has been established in comments, you are likely running out of file handles because you are running too many concurrent operations on your files. So, a solution is to limit the number of concurrent operations that run at once so too many files aren't in use at the same time.
Here's a somewhat different implementation that uses Bluebird promises to control both the async aspects of the operation and the concurrency aspects of the operation.
To make the management of the concurrency aspect easier, this collects the entire list of files into an array first and then processes the array of filenames rather than processing as you go. This makes it easier to use a built-in concurrency capability in Bluebird's .map() (which works on a single array) so we don't have to write that code ourselves:
var Promise = require("bluebird");
var fs = Promise.promisifyAll(require("fs"));
var path = require("path");
// recurse a directory, call a callback on each file (that returns a promise)
// run a max of numConcurrent callbacks at once
// returns a promise for when all work is done
function processDir(dir, numConcurrent, fileCallback) {
var allFiles = [];
function listDir(dir, list) {
var dirs = [];
return fs.readdirAsync(dir).map(function(file) {
var filePath = path.join(dir , file);
return fs.statAsync(filePath).then(function(stats) {
if (stats.isFile()) {
allFiles.push(filePath);
} else if (stats.isDirectory()) {
return listDir(filePath);
}
}).catch(function() {
// ignore errors on .stat - file could just be gone now
return;
});
});
}
return listDir(dir, allFiles).then(function() {
return Promise.map(allFiles, function(filename) {
return fileCallback(filename);
}, {concurrency: numConcurrent});
});
}
// example usage:
// pass the initial directory,
// the number of concurrent operations allowed at once
// and a callback function (that returns a promise) to process each file
processDir(process.cwd(), 5, function(file) {
// put your own code here to process each file
// this is code to cause each callback to take a random amount of time
// for testing purposes
var rand = Math.floor(Math.random() * 500) + 500;
return Promise.delay(rand).then(function() {
console.log(file);
});
}).catch(function(e) {
// error here
}).finally(function() {
console.log("done");
});
FYI, I think you'll find that proper error propagation and proper error handling from many async operations is much, much easier with promises than the plain callback method.

Console.time always returns 0.000ms

I'm using node-webkit to create a album manager and I'm setting up a recursive scan to find all my photos. I'm scanning some 10k files, but console.time just keeps returning 0.000ms. I know the scan is happening pretty quick, but it's not that quick. Am I doing something wrong?
var fs = require('fs');
var path = 'I:/pictures/';
console.time('read-directory');
var scanDirectory = function(path) {
fs.readdir(path,function(err,files) {
if(err) {
console.log(err);
} else {
files.forEach(function(file) {
fs.stat(path + file, function(err,stats) {
if(err) {
console.log(err);
} else {
if(stats.isDirectory()) {
scanDirectory(path + file + '/');
} else {
console.log(path + file);
}
}
});
});
}
});
}
scanDirectory(path);
console.timeEnd('read-directory');
You are using fs.readdir which is asynchronous. So your timer is not depending of your scanDirectory execution.
In fact, it's just launching your function when you call scanDirectory(path) then directly after stop the timer.
If you want you can use fs.readdirSync which will prevent to jump to the timer end as it's a synchronise function. The problems, is that will freeze your application (if your use it's directly like that) during this time and problably slow your execution.
In order to get the time of execution of your asynchrone function you can use the profiler tool of Node-webkit. But you will need to filter and sum them manually...
The other solution is to use timely (it's an npm package ) that can time synchronous or asynchronous functions.

Can a node.js server know if a server file is created?

which is the most elegant way or technology to let a node.js server know if a file is created on a server?
The idea is: a new image has been created (from a webcam or so) -> dispatch an event!
UPDATE: The name of the new file in the directory is not known a priori and the file is generated by an external software.
You should take a look at fs.watch(). It allows you to "watch" a file or directory and receive events when things change.
Note: The documentation states that fs.watch is not consistent across platforms, so you should take that in to account before using it.
fs.watch(fileOrDirectoryPath, function(event, filename) {
// Something changed with filename, trigger event appropriately
});
Also something to be aware of from the docs:
Providing filename argument in the callback is not supported on every
platform (currently it's only supported on Linux and Windows). Even on
supported platforms filename is not always guaranteed to be provided.
Therefore, don't assume that filename argument is always provided in
the callback, and have some fallback logic if it is null.
If filename is not available on your platform and you're watching a directory you may need to do something where you initially read the directory and cache the list of files in it. Then, when you get an event from fs.watch, read the directory again and compare it to the cached list of files to see what was added (if anything).
Update 1: There's a good module called watch, on github, which makes it easy to watch a directory for new files.
Update 2: I threw together an example of how to use fs.watch to get notified when new files are added to a directory. I think the module I linked to above is probably the better way to go, but I thought it would be nice to have a basic example of how it might work if you were to do it yourself.
Note: This is a fairly simplistic example just to show how it could work in general. It could almost certainly be done more efficiently and it's far from throughly tested.
function watchForNewFiles(directory, callback) {
// Get a list of all the files in the directory
fs.readdir(directory, function(err, files) {
if (err) {
callback(err);
} else {
var originalFiles = files;
// Start watching the directory for new events
var watcher = fs.watch(directory, function(event, filename) {
// Get the updated list of all the files in the directory
fs.readdir(directory, function(err, files) {
if (err) {
callback(err);
} else {
// Filter out any files we already knew about
var newFiles = files.filter(function(f) {
return (originalFiles.indexOf(f) < 0);
});
// Reset our list of "original" files
originalFiles = files;
// If there are new files detected, call the callback
if (newFiles.length) {
callback(null, newFiles);
}
}
})
});
}
});
}
Then, to watch a directory you'd call it with:
watchForNewFiles(someDirectoryPath, function(err, files) {
if (err) {
// handle error
} else {
// handle any newly added files
// "files" is an array of filenames that have been added to the directory
}
});
I came up with my own solution using this code here:
var fs = require('fs');
var intID = setInterval(check,1000);
function check() {
fs.exists('file.txt', function check(exists) {
if (exists) {
console.log("Created!");
clearInterval(intID);
}
});
}
You could add a parameter to the check function with the name of the file and call it in the path.
I did some tests on fs.watch() and it does not work if the file is not created. fs.watch() has multiple issues anyways and I would never suggest using it... It does work to check if the file was deleted though...

node.js file system problems

I keep banging my head against the wall because of tons of different errors. This is what the code i try to use :
fs.readFile("balance.txt", function (err, data) //At the beginning of the script (checked, it works)
{
if (err) throw err;
balance=JSON.parse(data);;
});
fs.readFile("pick.txt", function (err, data)
{
if (err) throw err;
pick=JSON.parse(data);;
});
/*....
.... balance and pick are modified
....*/
if (shutdown)
{
fs.writeFile("balance2.txt", JSON.stringify(balance));
fs.writeFile("pick2.txt", JSON.stringify(pick));
process.exit(0);
}
At the end of the script, the files have not been modified the slightest. I then found out on this site that the files were being opened 2 times simultaneously, or something like that, so i tried this :
var balance, pick;
var stream = fs.createReadStream("balance.txt");
stream.on("readable", function()
{
balance = JSON.parse(stream.read());
});
var stream2 = fs.createReadStream("pick.txt");
stream2.on("readable", function()
{
pick = JSON.parse(stream2.read());
});
/****
****/
fs.unlink("pick.txt");
fs.unlink("balance.txt");
var stream = fs.createWriteStream("balance.txt", {flags: 'w'});
var stream2 = fs.createWriteStream("pick.txt", {flags: 'w'});
stream.write(JSON.stringify(balance));
stream2.write(JSON.stringify(pick));
process.exit(0);
But, this time, both files are empty... I know i should catch errors, but i just don't see where the problem is. I don't mind storing the 2 objects in the same file, if that can helps. Besides that, I never did any javascript in my life before yesterday, so, please give me a simple explanation if you know what failed here.
What I think you want to do is use readFileSync and not use readFile to read your files since you need them to be read before doing anything else in your program (http://nodejs.org/api/fs.html#fs_fs_readfilesync_filename_options).
This will make sure you have read both the files before you execute any of the rest of your code.
Make your like code do this:
try
{
balance = JSON.parse(fs.readFileSync("balance.txt"));
pick = JSON.parse(fs.readFileSync("pick.txt"));
}
catch(err)
{ throw err; }
I think you will get the functionality you are looking for by doing this.
Note, you will not be able to check for an error in the same way you can with readFile. Instead you will need to wrap each call in a try catch or use existsSync before each operation to make sure you aren't trying to read a file that doesn't exist.
How to capture no file for fs.readFileSync()?
Furthermore, you have the same problem on the writes. You are kicking off async writes and then immediately calling process.exit(0). A better way to do this would be to either write them sequentially asynchronously and then exit or to write them sequentially synchronously then exit.
Async option:
if (shutdown)
{
fs.writeFile("balance2.txt", JSON.stringify(balance), function(err){
fs.writeFile("pick2.txt", JSON.stringify(pick), function(err){
process.exit(0);
});
});
}
Sync option:
if (shutdown)
{
fs.writeFileSync("balance2.txt", JSON.stringify(balance));
fs.writeFileSync("pick2.txt", JSON.stringify(pick));
process.exit(0);
}

Creating a file only if it doesn't exist in Node.js

We have a buffer we'd like to write to a file. If the file already exists, we need to increment an index on it, and try again. Is there a way to create a file only if it doesn't exist, or should I just stat files until I get an error to find one that doesn't exist already?
For example, I have files a_1.jpg and a_2.jpg. I'd like my method to try creating a_1.jpg and a_2.jpg, and fail, and finally successfully create a_3.jpg.
The ideal method would look something like this:
fs.writeFile(path, data, { overwrite: false }, function (err) {
if (err) throw err;
console.log('It\'s saved!');
});
or like this:
fs.createWriteStream(path, { overwrite: false });
Does anything like this exist in node's fs library?
EDIT: My question isn't if there's a separate function that checks for existence. It's this: is there a way to create a file if it doesn't exist, in a single file system call?
As your intuition correctly guessed, the naive solution with a pair of exists / writeFile calls is wrong. Asynchronous code runs in unpredictable ways. And in given case it is
Is there a file a.txt? — No.
(File a.txt gets created by another program)
Write to a.txt if it's possible. — Okay.
But yes, we can do that in a single call. We're working with file system so it's a good idea to read developer manual on fs. And hey, here's an interesting part.
'w' - Open file for writing. The file is created (if it does not
exist) or truncated (if it exists).
'wx' - Like 'w' but fails if path exists.
So all we have to do is just add wx to the fs.open call. But hey, we don't like fopen-like IO. Let's read on fs.writeFile a bit more.
fs.readFile(filename[, options], callback)#
filename String
options Object
encoding String | Null default = null
flag String default = 'r'
callback Function
That options.flag looks promising. So we try
fs.writeFile(path, data, { flag: 'wx' }, function (err) {
if (err) throw err;
console.log("It's saved!");
});
And it works perfectly for a single write. I guess this code will fail in some more bizarre ways yet if you try to solve your task with it. You have an atomary "check for a_#.jpg existence, and write there if it's empty" operation, but all the other fs state is not locked, and a_1.jpg file may spontaneously disappear while you're already checking a_5.jpg. Most* file systems are no ACID databases, and the fact that you're able to do at least some atomic operations is miraculous. It's very likely that wx code won't work on some platform. So for the sake of your sanity, use database, finally.
Some more info for the suffering
Imagine we're writing something like memoize-fs that caches results of function calls to the file system to save us some network/cpu time. Could we open the file for reading if it exists, and for writing if it doesn't, all in the single call? Let's take a funny look on those flags. After a while of mental exercises we can see that a+ does what we want: if the file doesn't exist, it creates one and opens it both for reading and writing, and if the file exists it does so without clearing the file (as w+ would). But now we cannot use it neither in (smth)File, nor in create(Smth)Stream functions. And that seems like a missing feature.
So feel free to file it as a feature request (or even a bug) to Node.js github, as lack of atomic asynchronous file system API is a drawback of Node. Though don't expect changes any time soon.
Edit. I would like to link to articles by Linus and by Dan Luu on why exactly you don't want to do anything smart with your fs calls, because the claim was left mostly not based on anything.
What about using the a option?
According to the docs:
'a+' - Open file for reading and appending. The file is created if it does not exist.
It seems to work perfectly with createWriteStream
This method is no longer recommended. fs.exists is deprecated. See comments.
Here are some options:
1) Have 2 "fs" calls. The first one is the "fs.exists" call, and the second is "fs.write / read, etc"
//checks if the file exists.
//If it does, it just calls back.
//If it doesn't, then the file is created.
function checkForFile(fileName,callback)
{
fs.exists(fileName, function (exists) {
if(exists)
{
callback();
}else
{
fs.writeFile(fileName, {flag: 'wx'}, function (err, data)
{
callback();
})
}
});
}
function writeToFile()
{
checkForFile("file.dat",function()
{
//It is now safe to write/read to file.dat
fs.readFile("file.dat", function (err,data)
{
//do stuff
});
});
}
2) Or Create an empty file first:
--- Sync:
//If you want to force the file to be empty then you want to use the 'w' flag:
var fd = fs.openSync(filepath, 'w');
//That will truncate the file if it exists and create it if it doesn't.
//Wrap it in an fs.closeSync call if you don't need the file descriptor it returns.
fs.closeSync(fs.openSync(filepath, 'w'));
--- ASync:
var fs = require("fs");
fs.open(path, "wx", function (err, fd) {
// handle error
fs.close(fd, function (err) {
// handle error
});
});
3) Or use "touch": https://github.com/isaacs/node-touch
Todo this in a single system call you can use the fs-extra npm module.
After this the file will have been created as well as the directory it is to be placed in.
const fs = require('fs-extra');
const file = '/tmp/this/path/does/not/exist/file.txt'
fs.ensureFile(file, err => {
console.log(err) // => null
});
Another way is to use ensureFileSync which will do the same thing but synchronous.
const fs = require('fs-extra');
const file = '/tmp/this/path/does/not/exist/file.txt'
fs.ensureFileSync(file)
With async / await and Typescript I would do:
import * as fs from 'fs'
async function upsertFile(name: string) {
try {
// try to read file
await fs.promises.readFile(name)
} catch (error) {
// create empty file, because it wasn't found
await fs.promises.writeFile(name, '')
}
}
Here's a synchronous way of doing it:
try {
await fs.truncateSync(filepath, 0);
} catch (err) {
await fs.writeFileSync(filepath, "", { flag: "wx" });
}
If the file exists it will get truncated, otherwise it gets created if an error is raised.
This works for me.
// Use the file system fs promises
const {access} = require('fs/promises');
// File Exist returns true
// dont use exists which is no more!
const fexists =async (path)=> {
try {
await access(path);
return true;
} catch {
return false;
}
}
// Wrapper for your main program
async function mainapp(){
if( await fexists("./users.json")){
console.log("File is here");
} else {
console.log("File not here -so make one");
}
}
// run your program
mainapp();
Just keep eye on your async - awaits so everthing plays nice.
hope this helps.
You can do something like this:
function writeFile(i){
var i = i || 0;
var fileName = 'a_' + i + '.jpg';
fs.exists(fileName, function (exists) {
if(exists){
writeFile(++i);
} else {
fs.writeFile(fileName);
}
});
}

Resources