Why fs.watchFile called twice in Node? - node.js

Ubuntu 12.04 Node v0.6.14 CoffeeScript 1.3.1
fs.watchFile coffee_eval, (e) ->
console.log e
result = spawn 'coffee', ['-bc', coffee_eval]
msg = ''
result.stderr.on 'data', (str) ->
msg+= str
result.stderr.on 'end', ->
console.log 'msg: ', msg
print "!! #{coffee_eval}\n"
Whole code on gist: https://gist.github.com/2621576
Every time I save a file which is watched, the main function is called twitce rather than once.
My Editor is Sumlime Text 2.
the output words can be see :

fs.watchFile is unstable. From the node docs:
fs.watchFile(filename, [options], listener)#
Stability: 2 - Unstable. Use fs.watch instead, if available.
You can try fs.watch, but unfortunately it may suffer from the same problem. I had the same issue with fs.watch on windows, when trying to create a similar monitor script.
The workaround was to log the time when the modification occurs and ignore the second change if it was triggered withing a few milliseconds. A bit ugly but it worked.

The problem is still present, here is the way I have found.
var actionDone = {}
fs.watch('.', function(x,filename) {
var path = './'+filename;
var stats = fs.statSync(path);
let seconds = +stats.mtime;
if(actionDone[filename] == seconds) return;
actionDone[filename] = seconds
//write your code here
});
We check if the last modified time is different before to continue.

I would suggest trying node-inotify-plusplus (https://github.com/coolaj86/node-inotify-plusplus) which has worked much better for me than fs.watchFile or fs.watch.

If you are using underscore or lodash, you could consider using throttle and discard the calls on the trailing edge. A basic example would be
var fs = require('fs');
var _ = require("lodash");
function FileWatcher (fileName)
{
this.file = fileName;
this.onChange = _.throttle(this.trigger, 100, {leading: false});
}
FileWatcher.prototype.observe = function ()
{
fs.watch(this.file, this.onChange);
}
FileWatcher.prototype.trigger = function ()
{
console.log("file changed :)");
}
var fileToWatch = __dirname + "/package.json";
new FileWatcher(fileToWatch).observe();

This is an old question but a common one which needs more up to date answer: It is the editor updating the content multiple times.
First and foremost, it is advised to use fs.watch instead. However you may experience same problem with fs.watch but it does not fire same event multiple times for a single change, you get that because your editor is updating the file content multiple times.
We can test this using a simple node server which writes to file when it receives a request:
const http = require('http');
const fs = require('fs');
const path = require('path');
const host = 'localhost';
const port = 3000;
const file = path.join(__dirname, 'config.json');
const requestListener = function (req, res) {
const data = new Date().toString();
fs.writeFileSync(file, data, { encoding: 'utf-8' });
res.end(data);
};
const server = http.createServer(requestListener);
server.listen(port, host, () => {
fs.watch(file, (eventType, filename) => {
console.log({ eventType });
});
console.log(`Server is running on http://${host}:${port}`);
});
Visit the server and observe the output. You will see it logs single event:
Server is running on http://localhost:3000
{ eventType: 'change' }
If you edit the file in an editor like VSCode, you may see multiple events are logged.
That is because editors tend to use stream API for efficiency and read and write files in chunks which causes the change event fired multiple times depending on the chunk size and the file length.
Since file has the same stats, you can use it to eliminate the duplicate event:
let lastModified;
fs.watch(file, (eventType, filename) => {
stat(file).then(({ mtimeMs }) => {
if (lastModified !== mtimeMs) {
lastModified = mtimeMs;
// Do your work here! It will run once!
console.log({ eventType, filename });
}
});
});
Check this anwswer to see how you can use it with other fs methods: https://stackoverflow.com/a/75149864/7134134

To solve this problem, I keep track of the previous "file modified" timestamp and don't run my normal callback code if the value is the same.
var filename = "/path/to/file";
var previousMTime = new Date(0);
var watcher = fs.watch(filename, {
persistent: false
}, function(){
fs.stat(filename, function(err, stats){
if(stats.mtime.valueOf() === previousMTime.valueOf()){
console.log("File Update Callback Stopped (same revision time)");
return;
}
previousMTime = stats.mtime;
// do your interesting stuff down here
});
});

I solved this problem by flipping an 'ignore' flag from false to true every time an update was received, thereby ignoring every second event. BUT i also found that sometimes, a change to the file only resulted in one update. I'm not sure what causes this but it seemed to happen when updates were very frequent, and when those updates were appends (>>). I did not observe any instances of a single change triggering more than two events.
There's more discussion of the issue in this question. I also posted some example code for my solution there.

Another suggestion for an npm module which is much better than fs.watch or fs.watchFile:
https://github.com/paulmillr/chokidar/

It isn't directly related with the original question, but do you know what this sentence mean? (from fs.watch documentation).
Also note the listener callback is attached to the 'change' event fired by fs.FSWatcher, but it is not the same thing as the 'change' value of eventType.

The problem has been fixed (at least) with fs.watch now. I didn't try fs.watchFile because it's not recommended by NodeJS documentation.
One change event each time file is modified
2 rename events when you changed a file name, one for current name and one for new name
One rename event for newly created file (copied from other location) or deleted file.
My environment: macOS 10.12.6 and NodeJS v11.10.1

Related

Node Read Streams - How can I limit the number of open files?

I'm running into AggregateError: EMFILE: too many open files while streaming multiple files.
Machine Details:
MacOS Monterey,
MacBook Pro (14-inch, 2021),
Chip Apple M1 Pro,
Memory 16GB,
Node v16.13.0
I've tried increasing the limits with no luck.
Ideally I would like to be able to set the limit of the number of files open at one time or resolve by closing files as soon as they have been used.
Code below. I've tried to remove the unrelated code and replace it with '//...'.
const MultiStream = require('multistream');
const fs = require('fs-extra'); // Also tried graceful-fs and the standard fs
const { fdir } = require("fdir");
// Also have a require for the bz2 and split2 functions but editing from phone right now
//...
let files = [];
//...
(async() => {
const crawler = await new fdir()
.filter((path, isDirectory) => path.endsWith(".bz2"))
.withFullPaths()
.crawl("Dir/Sub Dir")
.withPromise();
for(const file of crawler){
files = [...files, fs.createReadStream(file)]
}
multi = await new MultiStream(files)
// Unzip
.pipe(bz2())
// Create chunks from lines
.pipe(split2())
.on('data', function (obj) {
// Code to filter data and extract what I need
//...
})
.on("error", function(error) {
// Handling parsing errors
//...
})
.on('end', function(error) {
// Output results
//...
})
})();
To prevent pre-opening a filehandle for every single file in your array, you want to only open the files upon demand when it's that particular file's turn to be streamed. And, you can do that with multi-stream.
Per the multi-stream doc, you can lazily create the readStreams by changing this:
for(const file of crawler){
files = [...files, fs.createReadStream(file)]
}
to this:
let files = crawler.map((f) => {
return function() {
return fs.createReadStream(f);
}
});
After reading over the npm page for multistream I think I have found something that will help. I have also edited where you are adding the stream to the files array as I don't see a need to instantiate a new array and spread existing elements like you are doing.
To lazily create the streams, wrap them in a function:
var streams = [
fs.createReadStream(__dirname + '/numbers/1.txt'),
function () { // will be executed when the stream is active
return fs.createReadStream(__dirname + '/numbers/2.txt')
},
function () { // same
return fs.createReadStream(__dirname + '/numbers/3.txt')
}
]
new MultiStream(streams).pipe(process.stdout) // => 123 ```
With that we can update your logic to include this functionality by simply wrapping the readStreams in functions, this way the streams will not be created until they are needed. This will prevent you from having too many open at once. We can do this by simply updating your file loop:
for(const file of crawler){
files.push(function() {
return fs.createReadStream(file)
})
}

How to download multiple links from a .csv file using multithreading in node.js?

I am trying to download links from a .csv file and store the downloaded files in a folder. I have used multithreading library for this i.e mt-files-downloader. The files are downloading fine but it takes too much time to download about 313 files. These files are about 400Kb in size max. When i tried using normal download using node i could download them in a minute or two but with this library the download should be fast as i am using multithread library but it takes lot of time. Below is my code any help would be useful. Thanks!
var rec;
csv
.fromStream(stream, { headers: ["Recording", , , , , , , ,] })
.on("data", function (records) {
rec = records.Recording;
//console.log(rec);
download(rec);
})
.on("end", function () {
console.log('Reading complete')
});
function download(rec) {
var filename = rec.replace(/\//g, '');
var filePath = './recordings/'+filename;
var downloadPath = path.resolve(filePath)
var fileUrl = 'http:' + rec;
var downloader = new Downloader();
var dl = downloader.download(fileUrl, downloadPath);
dl.start();
dl.on('error', function(dl) {
var dlUrl = dl.url;
console.log('error downloading = > '+dl.url+' restarting download....');
if(!dlUrl.endsWith('.wav') && !dlUrl.endsWith('Recording')){
console.log('resuming file download => '+dlUrl);
dl.resume();
}
});
}
You're right, downloading 313 files of 400kB should not take long - and I don't think this has to do with your code - maybe the connection is bad? Have you tried downloading a single file via curl?
Anyway I see two problems in your approach with which I can help:
first - you download all the files at the same time (which may introduce some overhead on the server)
second - your error handling will run in loop without waiting and checking the actual file, so if there's a 404 - you'll flood the server with requests.
Using streams with on('data') events has a major drawback of executing all the chunks more or less synchronously as they are read. This means that your code will execute whatever is in on('data') handler never waiting for completion of your downloads. The only limiting factor is now how fast the server can read the csv - and I'd expect millions of lines per second to be normal.
From the server perspective, you're simply requesting 313 files at once, which will result, not wanting to speculate on the actual technical mechanisms of the server, in some of those requests waiting and interfering with each other.
This can be solved by using a streaming framework, like scramjet, event-steram or highland for instance. I'm the author of the first and it's IMHO the easiest in this case, but you can use any of those changing the code a little to match their API - it's pretty similar in all cases anyway.
Here's a heavily commented code that will run a couple downloads in parallel:
const {StringStream} = require("scramjet");
const sleep = require("sleep-promise");
const Downloader = require('mt-files-downloader');
const downloader = new Downloader();
const {StringStream} = require("scramjet");
const sleep = require("sleep-promise");
const Downloader = require('mt-files-downloader');
const downloader = new Downloader();
// First we create a StringStream class from your csv stream
StringStream.from(csvStream)
// we parse it as CSV without columns
.CSVParse({header: false})
// we set the limit of parallel operations, it will get propagated.
.setOptions({maxParallel: 16})
// now we extract the first column as `recording` and create a
// download request.
.map(([recording]) => {
// here's the first part of your code
const filename = rec.replace(/\//g, '');
const filePath = './recordings/'+filename;
const downloadPath = path.resolve(filePath)
const fileUrl = 'http:' + rec;
// at this point we return the dl object so we can keep these
// parts separate.
// see that the download hasn't been started yet
return downloader.download(fileUrl, downloadPath);
})
// what we get is a stream of not started download objects
// so we run this asynchronous function. If this returns a Promise
// it will wait
.map(
async (dl) => new Promise((res, rej) => {
// let's assume a couple retries we allow
let retries = 10;
dl.on('error', async (dl) => {
try {
// here we reject if the download fails too many times.
if (retries-- === 0) throw new Error(`Download of ${dl.url} failed too many times`);
var dlUrl = dl.url;
console.log('error downloading = > '+dl.url+' restarting download....');
if(!dlUrl.endsWith('.wav') && !dlUrl.endsWith('Recording')){
console.log('resuming file download => '+dlUrl);
// lets wait half a second before retrying
await sleep(500);
dl.resume();
}
} catch(e) {
// here we call the `reject` function - meaning that
// this file wasn't downloaded despite retries.
rej(e);
}
});
// here we call `resolve` function to confirm that the file was
// downloaded.
dl.on('end', () => res());
})
)
// we log some message and ignore the result in case of an error
.catch(e => {
console.error('An error occured:', e.message);
return;
})
// Every steram must have some sink to flow to, the `run` method runs
// every operation above.
.run();
You can also use the stream to push out some kind of log messages and use pipe(process.stderr) in the end, instead of those console.logs. Please check the scramjet documentation for additional info and a Mozilla doc on async functions

how to use Node.JS foreach function with Event listerner

I am not sure where I am going wrong but I think that the event listener is getting invoked multiple times and parsing the files multiple times.
I have five files in the directory and they are getting parsed. However the pdf file with array 0 gets parsed once and the next one twice and third one three times.
I want the each file in the directory to be parsed once and create a text file by extracting the data from pdf.
The Idea is to parse the pdf get the content as text and convert the text in to json in a specific format.
To make it simple, the plan is to complete one task first then use the output from the below code to perform the next task.
Hope anyone can help and point out where i am going wrong and explain a bit about my mistake so i understand it. (new to the JS and Node)
Regards,
Jai
Using the module from here:
https://github.com/modesty/pdf2json
var fs = require('fs')
PDFParser = require('C:/Users/Administrator/node_modules/pdf2json/PDFParser')
var pdfParser = new PDFParser(this, 1)
fs.readdir('C:/Users/Administrator/Desktop/Project/Input/',function(err,pdffiles){
//console.log(pdffiles)
pdffiles.forEach(function(pdffile){
console.log(pdffile)
pdfParser.once("pdfParser_dataReady",function(){
fs.writeFile('C:/Users/Administrator/Desktop/Project/Jsonoutput/'+pdffile, pdfParser.getRawTextContent())
pdfParser.loadPDF('C:/Users/Administrator/Desktop/Project/Input/'+pdffile)
})
})
})
As mentioned in the comment, just contributing 'work-around' ideas for OP to temporary resolve this issue.
Assuming performance is not an issue then you should be able to asynchronously parse the pdf files in a sequential matter. That is, only parse the next file when the first one is done.
Unfortunately I have never used the npm module PDFParser before so it is really difficult for me to try the code below. Pardon me as it may require some minor tweaks to make it to work, syntactically they should be fine as they were written using an IDE.
Example:
var fs = require('fs');
PDFParser = require('C:/Users/Administrator/node_modules/pdf2json/PDFParser');
var parseFile = function(files, done) {
var pdfFile = files.pop();
if (pdfFile) {
var pdfParser = new PDFParser();
pdfParser.on("pdfParser_dataError", errData => { return done(errData); });
pdfParser.on("pdfParser_dataReady", pdfData => {
fs.writeFile("'C:/Users/Administrator/Desktop/Project/Jsonoutput/" + pdfFile, JSON.stringify(pdfData));
parseFile(files, done);
});
pdfParser.loadPDF('C:/Users/Administrator/Desktop/Project/Input/' + pdfFile);
}
else {
return done(null, "All pdf files parsed.")
}
};
fs.readdir('C:/Users/Administrator/Desktop/Project/Input/',function(err,pdffiles){
parseFile(pdffiles, (err, message) => {
if (err) { console.error(err.parseError); }
else { console.log(message); }
})
});
In the code above, I have isolated out the parsing logic into a separated function called parseFile. In this function it first checks to see if there are still files to process or not, if none then it invokes the callback function done otherwise it will do an array.pop operation to get the next file in queue and starts parsing it.
When parsing is done then it recursively call the parseFile function until the last file is parsed.

fs.watch fired twice when I change the watched file

fs.watch( 'example.xml', function ( curr, prev ) {
// on file change we can read the new xml
fs.readFile( 'example.xml','utf8', function ( err, data ) {
if ( err ) throw err;
console.dir(data);
console.log('Done');
});
});
OUTPUT:
some data
Done X 1
some data
Done X 2
It is my usage fault or ..?
The fs.watch api:
is unstable
has known "behaviour" with regards repeated notifications. Specifically, the windows case being a result of windows design, where a single file modification can be multiple calls to the windows API
I make allowance for this by doing the following:
var fsTimeout
fs.watch('file.js', function(e) {
if (!fsTimeout) {
console.log('file.js %s event', e)
fsTimeout = setTimeout(function() { fsTimeout=null }, 5000) // give 5 seconds for multiple events
}
}
I suggest to work with chokidar (https://github.com/paulmillr/chokidar) which is much better than fs.watch:
Commenting its README.md:
Node.js fs.watch:
Doesn't report filenames on OS X.
Doesn't report events at all when using editors like Sublime on OS X.
Often reports events twice.
Emits most changes as rename.
Has a lot of other issues
Does not provide an easy way to recursively watch file trees.
Node.js fs.watchFile:
Almost as bad at event handling.
Also does not provide any recursive watching.
Results in high CPU utilization.
If you need to watch your file for changes then you can check out my small library on-file-change. It checks file sha1 hash between fired change events.
Explanation of why we have multiple fired events:
You may notice in certain situations that a single creation event generates multiple Created events that are handled by your component. For example, if you use a FileSystemWatcher component to monitor the creation of new files in a directory, and then test it by using Notepad to create a file, you may see two Created events generated even though only a single file was created. This is because Notepad performs multiple file system actions during the writing process. Notepad writes to the disk in batches that create the content of the file and then the file attributes. Other applications may perform in the same manner. Because FileSystemWatcher monitors the operating system activities, all events that these applications fire will be picked up.
Source
My custom solution
I personally like using return to prevent a block of code to run when checking something, so, here is my method:
var watching = false;
fs.watch('./file.txt', () => {
if(watching) return;
watching = true;
// do something
// the timeout is to prevent the script to run twice with short functions
// the delay can be longer to disable the function for a set time
setTimeout(() => {
watching = false;
}, 100);
};
Feel free to use this example to simplify your code. It may NOT be better than using a module from others, but it works pretty well!
Similar/same problem. I needed to do some stuff with images when they were added to a directory. Here's how I dealt with the double firing:
var fs = require('fs');
var working = false;
fs.watch('directory', function (event, filename) {
if (filename && event == 'change' && active == false) {
active = true;
//do stuff to the new file added
active = false;
});
It will ignore the second firing until if finishes what it has to do with the new file.
I'm dealing with this issue for the first time, so all of the answers so far are probably better than my solution, however none of them were 100% suitable for my case so I came up with something slightly different – I used a XOR operation to flip an integer between 0 and 1, effectively keeping track of and ignoring every second event on the file:
var targetFile = "./watchThis.txt";
var flippyBit = 0;
fs.watch(targetFile, {persistent: true}, function(event, filename) {
if (event == 'change'){
if (!flippyBit) {
var data = fs.readFile(targetFile, "utf8", function(error, data) {
gotUpdate(data);
})
} else {
console.log("Doing nothing thanks to flippybit.");
}
flipBit(); // call flipBit() function
}
});
// Whatever we want to do when we see a change
function gotUpdate(data) {
console.log("Got some fresh data:");
console.log(data);
}
// Toggling this gives us the "every second update" functionality
function flipBit() {
flippyBit = flippyBit ^ 1;
}
I didn't want to use a time-related function (like jwymanm's answer) because the file I'm watching could hypothetically get legitimate updates very frequently. And I didn't want to use a list of watched files like Erik P suggests, because I'm only watching one file. Jan Święcki's solution seemed like overkill, as I'm working on extremely short and simple files in a low-power environment. Lastly, Bernado's answer made me a little nervous – it would only ignore the second update if it arrived before I'd finished processing the first, and I can't handle that kind of uncertainty. If anyone were to find themselves in this very specific scenario, there might be some merit to the approach I used? If there's anything massively wrong with it please do let me know/edit this answer, but so far it seems to work well?
NOTE: Obviously this strongly assumes that you'll get exactly 2 events per real change. I carefully tested this assumption, obviously, and learned its limitations. So far I've confirmed that:
Modifying a file in Atom editor and saving triggers 2 updates
touch triggers 2 updates
Output redirection via > (overwriting file contents) triggers 2 updates
Appending via >> sometimes triggers 1 update!*
I can think of perfectly good reasons for the differing behaviours but we don't need to know why something is happening to plan for it – I just wanted to stress that you'll want to check for yourself in your own environment and in the context of your own use cases (duh) and not trust a self-confessed idiot on the internet. That being said, with precautions taken I haven't had any weirdness so far.
* Full disclosure, I don't actually know why this is happening, but we're already dealing with unpredictable behaviour with the watch() function so what's a little more uncertainty? For anyone following along at home, more rapid appends to a file seem to cause it to stop double-updating but honestly, I don't really know, and I'm comfortable with the behaviour of this solution in the actual case it'll be used, which is a one-line file that will be updated (contents replaced) like twice per second at the fastest.
first is change and the second is rename
we can make a difference from the listener function
function(event, filename) {
}
The listener callback gets two arguments (event, filename). event is either 'rename' or 'change', and filename is the name of the file which triggered the event.
// rm sourcefile targetfile
fs.watch( sourcefile_dir , function(event, targetfile)){
console.log( targetfile, 'is', event)
}
as a sourcefile is renamed as targetfile, it's will call three event as fact
null is rename // sourcefile not exist again
targetfile is rename
targetfile is change
notice that , if you want catch all these three evnet, watch the dir of sourcefile
I somtimes get multible registrations of the Watch event causing the Watch event to fire several times.
I solved it by keeping a list of watching files and avoid registering the event if the file allready is in the list:
var watchfiles = {};
function initwatch(fn, callback) {
if watchlist[fn] {
watchlist[fn] = true;
fs.watch(fn).on('change', callback);
}
}
......
Like others answers says... This got a lot of troubles, but i can deal with this in this way:
var folder = "/folder/path/";
var active = true; // flag control
fs.watch(folder, function (event, filename) {
if(event === 'rename' && active) { //you can remove this "check" event
active = false;
// ... its just an example
for (var i = 0; i < 100; i++) {
console.log(i);
}
// ... other stuffs and delete the file
if(!active){
try {
fs.unlinkSync(folder + filename);
} catch(err) {
console.log(err);
}
active = true
}
}
});
Hope can i help you...
Easiest solution:
const watch = (path, opt, fn) => {
var lock = false
fs.watch(path, opt, function () {
if (!lock) {
lock = true
fn()
setTimeout(() => lock = false, 1000)
}
})
}
watch('/path', { interval: 500 }, function () {
// ...
})
I was downloading file with puppeteer and once a file saved, I was sending automatic emails. Due to problem above, I noticed, I was sending 2 emails. I solved by stopping my application using process.exit() and auto-start with pm2. Using flags in code didn't saved me.
If anyone has this problem in future, one can use this solution as well. Exit from program and restart with monitor tools automatically.
Here's my simple solution. It works well every time.
// Update obj as file updates
obj = JSON.parse(fs.readFileSync('./file.json', 'utf-8'));
fs.watch('./file.json', () => {
const data = JSON.parse(fs.readFileSync('./file.json', 'utf-8') || '{}');
if(Object.entries(data).length > 0) { // This checks fs.watch() isn't false-firing
obj = data;
console.log('File actually changed: ', obj)
}
});
I came across the same issue. If you don't want to trigger multiple times, you can use a debounce function.
fs.watch( 'example.xml', _.debounce(function ( curr, prev ) {
// on file change we can read the new xml
fs.readFile( 'example.xml','utf8', function ( err, data ) {
if ( err ) throw err;
console.dir(data);
console.log('Done');
});
}, 100));
Debouncing The Observer
A solution I arrived at was that (a) there needs to be a workaround for the problem in question and, (b), there needs to be a solution to ensure multiple rapid Ctrl+s actions do not cause Race Conditions. Here's what I have...
./**/utilities.js (somewhere)
export default {
...
debounce(fn, delay) { // #thxRemySharp https://remysharp.com/2010/07/21/throttling-function-calls/
var timer = null;
return function execute(...args) {
var context = this;
clearTimeout(timer);
timer = setTimeout(fn.bind(context, ...args), delay);
};
},
...
};
./**/file.js (elsewhere)
import utilities from './**/utilities.js'; // somewhere
...
function watch(server) {
const debounced = utilities.debounce(observeFilesystem.bind(this, server), 1000 * 0.25);
const observers = new Set()
.add( fs.watch('./src', debounced) )
.add( fs.watch('./index.html', debounced) )
;
console.log(`watching... (${observers.size})`);
return observers;
}
function observeFilesystem(server, type, filename) {
if (!filename) console.warn(`Tranfer Dev Therver: filesystem observation made without filename for type ${type}`);
console.log(`Filesystem event occurred:`, type, filename);
server.close(handleClose);
}
...
This way, the observation-handler that we pass into fs.watch is [in this case a bound bunction] which gets debounced if multiple calls are made less than 1000 * 0.25 seconds (250ms) apart from one another.
It may be worth noting that I have also devised a pipeline of Promises to help avoid other types of Race Conditions as the code also leverages other callbacks. Please also note the attribution to Remy Sharp whose debounce function has repeatedly proven very useful over the years.
watcher = fs.watch( 'example.xml', function ( curr, prev ) {
watcher.close();
fs.readFile( 'example.xml','utf8', function ( err, data ) {
if ( err ) throw err;
console.dir(data);
console.log('Done');
});
});
I had similar similar problem but I was also reading the file in the callback which caused a loop.
This is where I found how to close watcher:
How to close fs.watch listener for a folder
NodeJS does not fire multiple events for a single change, it is the editor you are using updating the file multiple times.
Editors use stream API for efficiency, they read and write data in chunks which causes multiple updates depending on the chunks size and the amount of content. Here is a snippet to test if fs.watch fires multiple events:
const http = require('http');
const fs = require('fs');
const path = require('path');
const host = 'localhost';
const port = 3000;
const file = path.join(__dirname, 'config.json');
const requestListener = function (req, res) {
const data = new Date().toString();
fs.writeFileSync(file, data, { encoding: 'utf-8' });
res.end(data);
};
const server = http.createServer(requestListener);
server.listen(port, host, () => {
fs.watch(file, (eventType, filename) => {
console.log({ eventType });
});
console.log(`Server is running on http://${host}:${port}`);
});
I believe a simple solution would be checking for the last modified timestamp:
let lastModified;
fs.watch(file, (eventType, filename) => {
stat(file).then(({ mtimeMs }) => {
if (lastModified !== mtimeMs) {
lastModified = mtimeMs;
console.log({ eventType, filename });
}
});
});
Please note that you need to use all-sync or all-async methods otherwise you will have issues:
Update the file in a editor, you will see only single event is logged:
const http = require('http');
const host = 'localhost';
const port = 3000;
const fs = require('fs');
const path = require('path');
const file = path.join(__dirname, 'config.json');
let lastModified;
const requestListener = function (req, res) {
const data = Date.now().toString();
fs.writeFileSync(file, data, { encoding: 'utf-8' });
lastModified = fs.statSync(file).mtimeMs;
res.end(data);
};
const server = http.createServer(requestListener);
server.listen(port, host, () => {
fs.watch(file, (eventType, filename) => {
const mtimeMs = fs.statSync(file).mtimeMs;
if (lastModified !== mtimeMs) {
lastModified = mtimeMs;
console.log({ eventType });
}
});
console.log(`Server is running on http://${host}:${port}`);
});
Few notes on the alternative solutions: Storing files for comparison will be memory inefficient especially if you have large files, taking file hashes will be expensive, custom flags are hard to keep track of, especially if you are going to detect changes made by other applications, and lastly unsubscribing and re-subscribing requires unnecessary juggling.
If you don't need an instant result, you can use setTimout to debounce successive events:
let timeoutId;
fs.watch(file, (eventType, filename) => {
clearTimeout(timeoutId);
timeoutId = setTimeout(() => {
console.log({ eventType });
}, 100);
});

non-blocking way to write to filesystem with node.js

I've written a non-blocking tcp-server with node.js. This server listens on a port and reroutes the request to an other server via a http.request()
To have a back-log of the messages rerouted I want to append every message (single line of information) in a file with the date as filename.
The server is going to be hit by several devices on alternating intervals with small txt strings (800bytes). Writing to the filesystem implicitly calls for a blocking event. Is there a way to prevent this behavior??
If appendFile doesn't work out right, I have myself tested a solution for this using File streams that works with multiple clusters and won't clobber the output
Just use the asynchronous methods of the fs module like appendFile.
http://nodejs.org/api/fs.html#fs_fs_appendfile_filename_data_encoding_utf8_callback
Something like this might help.
var fs = require('fs');
var writer = {
files: {},
appendFile: function(path, data) {
if(this.files[path] === undefined) {
this.files[path] = {open: false, queue: []};
}
this.files[path].queue.push(data);
if(!this.files[path].open) {
this.files[path].open = true;
this.nextWrite(path);
}
},
nextWrite: function(path) {
var data = this.files[path].queue.shift(),
self = this;
if(data === undefined)
return this.files[path].open = false;
fs.appendFile(path, data, function(err) {
if (err) throw err;
self.nextWrite(path);
});
}
}
It requires version 0.8.0 of node for fs.appendFile, but it keeps a queue per file and then appends the things in the order they were added. It works, but I didn't spent very much time on it.. so use it for educational purposes only.
writer.appendFile('test.txt','hello');

Resources