I've successfully written a few nodejs HTTP handlers to serve data in response to an HTTP request. However, everything I've written has been using *Sync version of functions. I'm now quickly running into limitations of this approach.
I cannot figure out, however, how to properly use asynchronous functions in the HTTP request context. If I try an async call, processing quickly falls through and returns without giving the code a chance to process data.
What's the correct approach? I haven't been able to find any good examples, so any pointers to literature are appreciated. Short of that, what's an example of a handler for a GET request that scans a local directory and, say, returns a json list of file names and corresponding number of lines (or really any stub code of the above that shows the proper technique).
Here's a simple sample:
var http = require('http')
var fs = require('fs')
function dir (req, res) {
fs.readdir('.', function (error, files) {
if (error) {
res.writeHead(500)
res.end(error.message)
return
}
files.forEach(function (file) {
res.write(file + '\n')
})
res.end()
})
}
var server = http.createServer(dir)
server.listen(7000)
Run with node server.js and test it with curl :7000.
Yes the request handler returns before the readdir callback is executed. That is by design. That's how async programming works. It's OK. when the filesystem IO is done, the callback will execute and the response will be sent.
Peter Lyons' answer is great/correct. I'm going to expand on it a bit and suggest a different method of synchronization using promises and co as well as nested/looping asynchronicity.
/* Script to count all lines of a file */
const co = require("co");
// Promisifed fs -- eventually node will support this on its own
const fs = require("mz/fs");
const rootDir = 'files/';
// Recursivey count the lines of all files in the given directory and sum them
function countLines(directory) {
// We can only use `yield` inside a generator block
// `co` allows us to do this and walks through the generator on its own
// `yield` will not move to the next line until the promise resolves
//
// This is still asynchronous code but it is written in a way
// that makes it look synchronized. This entire block is asynchronous, so we
// can `countLines` of multiple directories simultaneously
return co(function* () {
// `files` will be an array of files in the given directory
const files = yield fs.readdir(directory);
// `.map` will create an array of promises. `yield` only completes when
// *all* promises in the array have resolved
const lines = yield files.map(file => countFileLines(file, directory));
// Sum the lines of all files in this directory
return lines.reduce((a, b) => a + b, 0);
});
}
function countFileLines(file, directory) {
// We need the full path to read the file
const fullPath = `${directory}/${file}`;
// `co` returns a promise, so `co` itself can be yielded
// This entire block is asynchronous so we should be able to count lines
// of files without waiting for each file to be read
return co(function* () {
// Used to check whether this file is a directory
const stats = yield fs.stat(fullPath);
if (stats.isDirectory()) {
// If it is, recursively count lines of this directory
return countLines(fullPath);
}
// Otherwise just get the line count of the file
const contents = yield fs.readFile(fullPath, "utf8");
return contents.split("\n").length - 1;
});
}
co(function* () {
console.log(yield countLines(rootDir));
})
// All errors propagate here
.catch(err => console.error(err.stack));
Note that this is just an example. There are probably already libraries to count lines of files in a directory and there are definitely libraries that simplify recursive reading/globbing of files.
which is the most elegant way or technology to let a node.js server know if a file is created on a server?
The idea is: a new image has been created (from a webcam or so) -> dispatch an event!
UPDATE: The name of the new file in the directory is not known a priori and the file is generated by an external software.
You should take a look at fs.watch(). It allows you to "watch" a file or directory and receive events when things change.
Note: The documentation states that fs.watch is not consistent across platforms, so you should take that in to account before using it.
fs.watch(fileOrDirectoryPath, function(event, filename) {
// Something changed with filename, trigger event appropriately
});
Also something to be aware of from the docs:
Providing filename argument in the callback is not supported on every
platform (currently it's only supported on Linux and Windows). Even on
supported platforms filename is not always guaranteed to be provided.
Therefore, don't assume that filename argument is always provided in
the callback, and have some fallback logic if it is null.
If filename is not available on your platform and you're watching a directory you may need to do something where you initially read the directory and cache the list of files in it. Then, when you get an event from fs.watch, read the directory again and compare it to the cached list of files to see what was added (if anything).
Update 1: There's a good module called watch, on github, which makes it easy to watch a directory for new files.
Update 2: I threw together an example of how to use fs.watch to get notified when new files are added to a directory. I think the module I linked to above is probably the better way to go, but I thought it would be nice to have a basic example of how it might work if you were to do it yourself.
Note: This is a fairly simplistic example just to show how it could work in general. It could almost certainly be done more efficiently and it's far from throughly tested.
function watchForNewFiles(directory, callback) {
// Get a list of all the files in the directory
fs.readdir(directory, function(err, files) {
if (err) {
callback(err);
} else {
var originalFiles = files;
// Start watching the directory for new events
var watcher = fs.watch(directory, function(event, filename) {
// Get the updated list of all the files in the directory
fs.readdir(directory, function(err, files) {
if (err) {
callback(err);
} else {
// Filter out any files we already knew about
var newFiles = files.filter(function(f) {
return (originalFiles.indexOf(f) < 0);
});
// Reset our list of "original" files
originalFiles = files;
// If there are new files detected, call the callback
if (newFiles.length) {
callback(null, newFiles);
}
}
})
});
}
});
}
Then, to watch a directory you'd call it with:
watchForNewFiles(someDirectoryPath, function(err, files) {
if (err) {
// handle error
} else {
// handle any newly added files
// "files" is an array of filenames that have been added to the directory
}
});
I came up with my own solution using this code here:
var fs = require('fs');
var intID = setInterval(check,1000);
function check() {
fs.exists('file.txt', function check(exists) {
if (exists) {
console.log("Created!");
clearInterval(intID);
}
});
}
You could add a parameter to the check function with the name of the file and call it in the path.
I did some tests on fs.watch() and it does not work if the file is not created. fs.watch() has multiple issues anyways and I would never suggest using it... It does work to check if the file was deleted though...
fs.watch( 'example.xml', function ( curr, prev ) {
// on file change we can read the new xml
fs.readFile( 'example.xml','utf8', function ( err, data ) {
if ( err ) throw err;
console.dir(data);
console.log('Done');
});
});
OUTPUT:
some data
Done X 1
some data
Done X 2
It is my usage fault or ..?
The fs.watch api:
is unstable
has known "behaviour" with regards repeated notifications. Specifically, the windows case being a result of windows design, where a single file modification can be multiple calls to the windows API
I make allowance for this by doing the following:
var fsTimeout
fs.watch('file.js', function(e) {
if (!fsTimeout) {
console.log('file.js %s event', e)
fsTimeout = setTimeout(function() { fsTimeout=null }, 5000) // give 5 seconds for multiple events
}
}
I suggest to work with chokidar (https://github.com/paulmillr/chokidar) which is much better than fs.watch:
Commenting its README.md:
Node.js fs.watch:
Doesn't report filenames on OS X.
Doesn't report events at all when using editors like Sublime on OS X.
Often reports events twice.
Emits most changes as rename.
Has a lot of other issues
Does not provide an easy way to recursively watch file trees.
Node.js fs.watchFile:
Almost as bad at event handling.
Also does not provide any recursive watching.
Results in high CPU utilization.
If you need to watch your file for changes then you can check out my small library on-file-change. It checks file sha1 hash between fired change events.
Explanation of why we have multiple fired events:
You may notice in certain situations that a single creation event generates multiple Created events that are handled by your component. For example, if you use a FileSystemWatcher component to monitor the creation of new files in a directory, and then test it by using Notepad to create a file, you may see two Created events generated even though only a single file was created. This is because Notepad performs multiple file system actions during the writing process. Notepad writes to the disk in batches that create the content of the file and then the file attributes. Other applications may perform in the same manner. Because FileSystemWatcher monitors the operating system activities, all events that these applications fire will be picked up.
Source
My custom solution
I personally like using return to prevent a block of code to run when checking something, so, here is my method:
var watching = false;
fs.watch('./file.txt', () => {
if(watching) return;
watching = true;
// do something
// the timeout is to prevent the script to run twice with short functions
// the delay can be longer to disable the function for a set time
setTimeout(() => {
watching = false;
}, 100);
};
Feel free to use this example to simplify your code. It may NOT be better than using a module from others, but it works pretty well!
Similar/same problem. I needed to do some stuff with images when they were added to a directory. Here's how I dealt with the double firing:
var fs = require('fs');
var working = false;
fs.watch('directory', function (event, filename) {
if (filename && event == 'change' && active == false) {
active = true;
//do stuff to the new file added
active = false;
});
It will ignore the second firing until if finishes what it has to do with the new file.
I'm dealing with this issue for the first time, so all of the answers so far are probably better than my solution, however none of them were 100% suitable for my case so I came up with something slightly different – I used a XOR operation to flip an integer between 0 and 1, effectively keeping track of and ignoring every second event on the file:
var targetFile = "./watchThis.txt";
var flippyBit = 0;
fs.watch(targetFile, {persistent: true}, function(event, filename) {
if (event == 'change'){
if (!flippyBit) {
var data = fs.readFile(targetFile, "utf8", function(error, data) {
gotUpdate(data);
})
} else {
console.log("Doing nothing thanks to flippybit.");
}
flipBit(); // call flipBit() function
}
});
// Whatever we want to do when we see a change
function gotUpdate(data) {
console.log("Got some fresh data:");
console.log(data);
}
// Toggling this gives us the "every second update" functionality
function flipBit() {
flippyBit = flippyBit ^ 1;
}
I didn't want to use a time-related function (like jwymanm's answer) because the file I'm watching could hypothetically get legitimate updates very frequently. And I didn't want to use a list of watched files like Erik P suggests, because I'm only watching one file. Jan Święcki's solution seemed like overkill, as I'm working on extremely short and simple files in a low-power environment. Lastly, Bernado's answer made me a little nervous – it would only ignore the second update if it arrived before I'd finished processing the first, and I can't handle that kind of uncertainty. If anyone were to find themselves in this very specific scenario, there might be some merit to the approach I used? If there's anything massively wrong with it please do let me know/edit this answer, but so far it seems to work well?
NOTE: Obviously this strongly assumes that you'll get exactly 2 events per real change. I carefully tested this assumption, obviously, and learned its limitations. So far I've confirmed that:
Modifying a file in Atom editor and saving triggers 2 updates
touch triggers 2 updates
Output redirection via > (overwriting file contents) triggers 2 updates
Appending via >> sometimes triggers 1 update!*
I can think of perfectly good reasons for the differing behaviours but we don't need to know why something is happening to plan for it – I just wanted to stress that you'll want to check for yourself in your own environment and in the context of your own use cases (duh) and not trust a self-confessed idiot on the internet. That being said, with precautions taken I haven't had any weirdness so far.
* Full disclosure, I don't actually know why this is happening, but we're already dealing with unpredictable behaviour with the watch() function so what's a little more uncertainty? For anyone following along at home, more rapid appends to a file seem to cause it to stop double-updating but honestly, I don't really know, and I'm comfortable with the behaviour of this solution in the actual case it'll be used, which is a one-line file that will be updated (contents replaced) like twice per second at the fastest.
first is change and the second is rename
we can make a difference from the listener function
function(event, filename) {
}
The listener callback gets two arguments (event, filename). event is either 'rename' or 'change', and filename is the name of the file which triggered the event.
// rm sourcefile targetfile
fs.watch( sourcefile_dir , function(event, targetfile)){
console.log( targetfile, 'is', event)
}
as a sourcefile is renamed as targetfile, it's will call three event as fact
null is rename // sourcefile not exist again
targetfile is rename
targetfile is change
notice that , if you want catch all these three evnet, watch the dir of sourcefile
I somtimes get multible registrations of the Watch event causing the Watch event to fire several times.
I solved it by keeping a list of watching files and avoid registering the event if the file allready is in the list:
var watchfiles = {};
function initwatch(fn, callback) {
if watchlist[fn] {
watchlist[fn] = true;
fs.watch(fn).on('change', callback);
}
}
......
Like others answers says... This got a lot of troubles, but i can deal with this in this way:
var folder = "/folder/path/";
var active = true; // flag control
fs.watch(folder, function (event, filename) {
if(event === 'rename' && active) { //you can remove this "check" event
active = false;
// ... its just an example
for (var i = 0; i < 100; i++) {
console.log(i);
}
// ... other stuffs and delete the file
if(!active){
try {
fs.unlinkSync(folder + filename);
} catch(err) {
console.log(err);
}
active = true
}
}
});
Hope can i help you...
Easiest solution:
const watch = (path, opt, fn) => {
var lock = false
fs.watch(path, opt, function () {
if (!lock) {
lock = true
fn()
setTimeout(() => lock = false, 1000)
}
})
}
watch('/path', { interval: 500 }, function () {
// ...
})
I was downloading file with puppeteer and once a file saved, I was sending automatic emails. Due to problem above, I noticed, I was sending 2 emails. I solved by stopping my application using process.exit() and auto-start with pm2. Using flags in code didn't saved me.
If anyone has this problem in future, one can use this solution as well. Exit from program and restart with monitor tools automatically.
Here's my simple solution. It works well every time.
// Update obj as file updates
obj = JSON.parse(fs.readFileSync('./file.json', 'utf-8'));
fs.watch('./file.json', () => {
const data = JSON.parse(fs.readFileSync('./file.json', 'utf-8') || '{}');
if(Object.entries(data).length > 0) { // This checks fs.watch() isn't false-firing
obj = data;
console.log('File actually changed: ', obj)
}
});
I came across the same issue. If you don't want to trigger multiple times, you can use a debounce function.
fs.watch( 'example.xml', _.debounce(function ( curr, prev ) {
// on file change we can read the new xml
fs.readFile( 'example.xml','utf8', function ( err, data ) {
if ( err ) throw err;
console.dir(data);
console.log('Done');
});
}, 100));
Debouncing The Observer
A solution I arrived at was that (a) there needs to be a workaround for the problem in question and, (b), there needs to be a solution to ensure multiple rapid Ctrl+s actions do not cause Race Conditions. Here's what I have...
./**/utilities.js (somewhere)
export default {
...
debounce(fn, delay) { // #thxRemySharp https://remysharp.com/2010/07/21/throttling-function-calls/
var timer = null;
return function execute(...args) {
var context = this;
clearTimeout(timer);
timer = setTimeout(fn.bind(context, ...args), delay);
};
},
...
};
./**/file.js (elsewhere)
import utilities from './**/utilities.js'; // somewhere
...
function watch(server) {
const debounced = utilities.debounce(observeFilesystem.bind(this, server), 1000 * 0.25);
const observers = new Set()
.add( fs.watch('./src', debounced) )
.add( fs.watch('./index.html', debounced) )
;
console.log(`watching... (${observers.size})`);
return observers;
}
function observeFilesystem(server, type, filename) {
if (!filename) console.warn(`Tranfer Dev Therver: filesystem observation made without filename for type ${type}`);
console.log(`Filesystem event occurred:`, type, filename);
server.close(handleClose);
}
...
This way, the observation-handler that we pass into fs.watch is [in this case a bound bunction] which gets debounced if multiple calls are made less than 1000 * 0.25 seconds (250ms) apart from one another.
It may be worth noting that I have also devised a pipeline of Promises to help avoid other types of Race Conditions as the code also leverages other callbacks. Please also note the attribution to Remy Sharp whose debounce function has repeatedly proven very useful over the years.
watcher = fs.watch( 'example.xml', function ( curr, prev ) {
watcher.close();
fs.readFile( 'example.xml','utf8', function ( err, data ) {
if ( err ) throw err;
console.dir(data);
console.log('Done');
});
});
I had similar similar problem but I was also reading the file in the callback which caused a loop.
This is where I found how to close watcher:
How to close fs.watch listener for a folder
NodeJS does not fire multiple events for a single change, it is the editor you are using updating the file multiple times.
Editors use stream API for efficiency, they read and write data in chunks which causes multiple updates depending on the chunks size and the amount of content. Here is a snippet to test if fs.watch fires multiple events:
const http = require('http');
const fs = require('fs');
const path = require('path');
const host = 'localhost';
const port = 3000;
const file = path.join(__dirname, 'config.json');
const requestListener = function (req, res) {
const data = new Date().toString();
fs.writeFileSync(file, data, { encoding: 'utf-8' });
res.end(data);
};
const server = http.createServer(requestListener);
server.listen(port, host, () => {
fs.watch(file, (eventType, filename) => {
console.log({ eventType });
});
console.log(`Server is running on http://${host}:${port}`);
});
I believe a simple solution would be checking for the last modified timestamp:
let lastModified;
fs.watch(file, (eventType, filename) => {
stat(file).then(({ mtimeMs }) => {
if (lastModified !== mtimeMs) {
lastModified = mtimeMs;
console.log({ eventType, filename });
}
});
});
Please note that you need to use all-sync or all-async methods otherwise you will have issues:
Update the file in a editor, you will see only single event is logged:
const http = require('http');
const host = 'localhost';
const port = 3000;
const fs = require('fs');
const path = require('path');
const file = path.join(__dirname, 'config.json');
let lastModified;
const requestListener = function (req, res) {
const data = Date.now().toString();
fs.writeFileSync(file, data, { encoding: 'utf-8' });
lastModified = fs.statSync(file).mtimeMs;
res.end(data);
};
const server = http.createServer(requestListener);
server.listen(port, host, () => {
fs.watch(file, (eventType, filename) => {
const mtimeMs = fs.statSync(file).mtimeMs;
if (lastModified !== mtimeMs) {
lastModified = mtimeMs;
console.log({ eventType });
}
});
console.log(`Server is running on http://${host}:${port}`);
});
Few notes on the alternative solutions: Storing files for comparison will be memory inefficient especially if you have large files, taking file hashes will be expensive, custom flags are hard to keep track of, especially if you are going to detect changes made by other applications, and lastly unsubscribing and re-subscribing requires unnecessary juggling.
If you don't need an instant result, you can use setTimout to debounce successive events:
let timeoutId;
fs.watch(file, (eventType, filename) => {
clearTimeout(timeoutId);
timeoutId = setTimeout(() => {
console.log({ eventType });
}, 100);
});
I would like to perform some arbitrarily expensive work on an arbitrarily large set of files. I would like to report progress in real-time and then display results after all files have been processed. If there are no files that match my expression, I'd like to to throw an error.
Imagine writing a test framework that loads up all of your test files, executes them (in no particular order), reports on progress in real-time, and then displays aggregate results after all tests have been completed.
Writing this code in a blocking language (like Ruby for example), is extremely straightforward.
As it turns out, I'm having trouble performing this seemingly simple task in node, while also truly taking advantage of asynchronous, event-based IO.
My first design, was to perform each step serially.
Load up all of the files, creating a collection of files to process
Process each file in the collection
Report the results when all files have been processed
This approach does work, but doesn't seem quite right to me since it causes the more computationally expensive portion of my program to wait for all of the file IO to complete. Isn't this the kind of waiting that Node was designed to avoid?
My second design, was to process each file as it was asynchronously found on disk. For the sake of argument, let's imagine a method that looks something like:
eachFileMatching(path, expression, callback) {
// recursively, asynchronously traverse the file system,
// calling callback every time a file name matches expression.
}
And a consumer of this method that looks something like this:
eachFileMatching('test/', /_test.js/, function(err, testFile) {
// read and process the content of testFile
});
While this design feels like a very 'node' way of working with IO, it suffers from 2 major problems (at least in my presumably erroneous implementation):
I have no idea when all of the files have been processed, so I don't know when to assemble and publish results.
Because the file reads are nonblocking, and recursive, I'm struggling with how to know if no files were found.
I'm hoping that I'm simply doing something wrong, and that there is some reasonably simple strategy that other folks use to make the second approach work.
Even though this example uses a test framework, I have a variety of other projects that bump up against this exact same problem, and I imagine anyone writing a reasonably sophisticated application that accesses the file system in node would too.
What do you mean by "read and process the content of testFile"?
I don't understand why you have no idea when all of the files are processed. Are you not using Streams? A stream has several events, not just data. If you handle the end events then you will know when each file has finished.
For instance you might have a list of filenames, set up the processing for each file, and then when you get an end event, delete the filename from the list. When the list is empty you are done. Or create a FileName object that contains the name and a completion status. When you get an end event, change the status and decrement a filename counter as well. When the counter gets to zero you are done, or if you are not confident you could scan all the FileName object to make sure that their status is completed.
You might also have a timer that checks the counter periodically, and if it doesn't change for some period of time, report that the processing might be stuck on the FileName objects whose status is not completed.
... I just came across this scenario in another question and the accepted answer (plus the github link) explains it well. Check out for loop over event driven code?
As it turns out, the smallest working solution that I've been able to build is much more complicated than I hoped.
Following is code that works for me. It can probably be cleaned up or made slightly more readable here and there, and I'm not interested in feedback like that.
If there is a significantly different way to solve this problem, that is simpler and/or more efficient, I'm very interested in hearing it. It really surprises me that the solution to this seemingly simple requirement would require such a large amount of code, but perhaps that's why someone invented blocking io?
The complexity is really in the desire to meet all of the following requirements:
Handle files as they are found
Know when the search is complete
Know if no files are found
Here's the code:
/**
* Call fileHandler with the file name and file Stat for each file found inside
* of the provided directory.
*
* Call the optionally provided completeHandler with an array of files (mingled
* with directories) and an array of Stat objects (one for each of the found
* files.
*
* Following is an example of a simple usage:
*
* eachFileOrDirectory('test/', function(err, file, stat) {
* if (err) throw err;
* if (!stat.isDirectory()) {
* console.log(">> Found file: " + file);
* }
* });
*
* Following is an example that waits for all files and directories to be
* scanned and then uses the entire result to do something:
*
* eachFileOrDirectory('test/', null, function(files, stats) {
* if (err) throw err;
* var len = files.length;
* for (var i = 0; i < len; i++) {
* if (!stats[i].isDirectory()) {
* console.log(">> Found file: " + files[i]);
* }
* }
* });
*/
var eachFileOrDirectory = function(directory, fileHandler, completeHandler) {
var filesToCheck = 0;
var checkedFiles = [];
var checkedStats = [];
directory = (directory) ? directory : './';
var fullFilePath = function(dir, file) {
return dir.replace(/\/$/, '') + '/' + file;
};
var checkComplete = function() {
if (filesToCheck == 0 && completeHandler) {
completeHandler(null, checkedFiles, checkedStats);
}
};
var onFileOrDirectory = function(fileOrDirectory) {
filesToCheck++;
fs.stat(fileOrDirectory, function(err, stat) {
filesToCheck--;
if (err) return fileHandler(err);
checkedFiles.push(fileOrDirectory);
checkedStats.push(stat);
fileHandler(null, fileOrDirectory, stat);
if (stat.isDirectory()) {
onDirectory(fileOrDirectory);
}
checkComplete();
});
};
var onDirectory = function(dir) {
filesToCheck++;
fs.readdir(dir, function(err, files) {
filesToCheck--;
if (err) return fileHandler(err);
files.forEach(function(file, index) {
file = fullFilePath(dir, file);
onFileOrDirectory(file);
});
checkComplete();
});
}
onFileOrDirectory(directory);
};
2 ways of doing this, first and probably considered serially would go something like
var files = [];
doFile(files, oncomplete);
function doFile(files, oncomplete) {
if (files.length === 0) return oncomplete();
var f = files.pop();
processFile(f, function(err) {
// Handle error if any
doFile(files, oncomplete); // Recurse
});
};
function processFile(file, callback) {
// Do whatever you want to do and once
// done call the callback
...
callback();
};
Second way, lets call it parallel is similar and goes summin like:
var files = [];
doFiles(files, oncomplete);
function doFiles(files, oncomplete) {
var exp = files.length;
var done = 0;
for (var i = 0; i < exp; i++) {
processFile(files[i], function(err) {
// Handle errors (but still need to increment counter)
if (++done === exp) return oncomplete();
});
}
};
function processFile(file, callback) {
// Do whatever you want to do and once
// done call the callback
...
callback();
};
Now it may seem obvious you should use the second approach but you'll find that for IO intensive operations you dont really get any performance gains when parallelising. One dissadvantage of first approach is that the recursion can blow out your stack trace.
Tnx
Guido