Difference between fs.exists and fs.existsSync - node.js

While working with the file I/O for node I found these two functions(fs.exists and fs.existsSync) to check if a file exists in the system. What are the differences between them?

exists is non blocking, and you do subsequent work with the file through a callback.
existsSync is blocking and freezes your whole app while it is working. This can be appealing to new node users because they can continue their code on the next line. However, once you become used to using callbacks, this is a far inferior way to do things.

One is working in a synchronize way (wait until finished) and another return immediately and return a promise which has a future value.

Related

Do file operations in NodeJS make use of OS async (polling?) or does it block the event loop?

Do file operations in NodeJS make use of OS async (polling?) or does it block the event loop?
I want to create a webapp that requires a lot of uploading and I'm worried this may cause delays.
Polling is not used.
does it block the event loop?
It will if you use the sync versions. For example, using readFileSync will block other JS from running until the file is read and comes back. This could well be a problem if multiple parts of the script call such a method at different times.
But if you use the asynchronous versions (for example, readFile) which either accept a callback or return a Promise, they will not block other JS from running while the file operation is going on.
When in doubt, use one of the asynchronous versions - I'd only use the sync versions for smaller apps with segments of code that will only run once, such as for the initial reading of a config file, for which preventing other JS from running during that one-time delay isn't an issue.

Throttling asynchronous events in NodeJS

I tried using NodeJS in a server-side script to parse the text content in local PDF files using pdf-parse, which in turn uses Mozilla's amazing PDF parser. Everything worked wonderfully in my dev sandbox, but the whole thing came crashing down on me when I attempted to use the same code in production.
My problem was caused by the sheer number of PDF files I'm trying to process asynchronously: I have more than 100K files that need processing, and Mozilla's PDF parser is (understandably) unconditionally asynchronous – the OS killed my node process because of too many open files. I had started by writing all of my code asynchronously (the preliminary part where I search for PDF files to parse), but even after refactoring all the code for synchronous operation, it still kept crashing.
The gist of the problem is related to the cost of the operations: walking the folder structure to look for PDF files is cheap, whereas actually opening the files, reading their contents and parsing them is expensive. So Node kept generating new promises for each file it encountered, and the promises were never fulfilled. If I tried to run the code manually on smaller folders, it worked like a charm – really fast and reliable. As soon as I tried to execute the code on the entire folder structure it crashed, no matter what.
I know Node enthusiasts always answer questions like these by saying the OP is using the wrong programming pattern, but I'm stumped as to what would be the correct pattern in this case.
You need to control how many simultaneous asynchronous operations you start at once. This is under your control. You don't show your code so we can just advise conceptually.
For example, if you look at this answer:
Promise.all consumes all my RAM
It shows a function called mapConcurrent() that iterates an array calling an asynchronous function that returns a promise with a maximum number of async operations "in flight" at any given time. You can tune that number of concurrent operations based on your situation.
Another implementation here:
Make several requests to an API that can only handle 20 request a minute
with a function call pMap() that does something similar.
There are other such implementations built into libraries such as Bluebird and Async-promises.

How is Node.js asynchronous AND single-threaded at the same time?

The question title basically says it all, but to rephrase it:What handles the asynchronous function execution, if the main (and only) thread is occupied with running down the main code block?
So far I have only found that the async code gets executed elsewhere or outside the main thread, but what does this mean specifically?
EDIT: The proposed Node.js Event loop question's answers may also address this topic, but I was looking for a less complex, more specific answer, rather then an explanation of Node.js concept. Also, it does not show up in search for anything similar to "node asynchronous single-threaded".
EDIT, #Mr_Thorynque: Running a query to get data from database and log it to console. Nothing gets logged, because Node, being async, does not wait for query to finish and for data to populate. (this is just an example as requested, NOT a part of my question)
var = data;
mysql.query(`SELECT *some rows from database*`, function (err, rows, fields) {
rows.forEach(function(row){
data += *gather the requested data*
});
});
console.log(data);
What it really comes down to is that the node process (which is single threaded) hands off the work to "something else". This could be the OS's I/O process, or a networked resource or whatever. By handing it off, it frees its thread to keep working on the next in-memory thing. It uses file handles to keep track of any pending work and in the event loop marries the two back together and fire the callback when the work is done.
Note that this is also why you can block processing in your own code if poorly designed. If your code runs complex tasks and doesn't hand off the work, you'll block the single thread.
This is about as simple an answer as I can make, I think that the details are well explained in the links in your comments.

Global Exception Manager in Node

I'm using process.on('uncaughtException') to catch any exceptions that would unexpectedly come up. In the function I write the data to a file, send an email, in the future it will likely do more.
Is there a way I can encapsulate the process.on() event in a file, then somehow require it in all the files that make up the application so I don't need to add that chunk of code to each file?
Node normally runs in a single process, so you only need process.on('uncaughtException') in one place.
The exception is if you use the cluster module or otherwise spawn other node processes, in which case you need the process.on('uncaughtException') loaded once for each process, but still not once for each file.
(Be careful doing too much in this handler, because by this point the process is considered unstable. I'm also not sure if async work is guaranteed to be run. The docs say that the correct use of 'uncaughtException' is to perform synchronous cleanup of allocated resources.)

Understanding the Event-Loop in node.js

I've been reading a lot about the Event Loop, and I understand the abstraction provided whereby I can make an I/O request (let's use fs.readFile(foo.txt)) and just pass in a callback that will be executed once a particular event indicates completion of the file reading is fired. However, what I do not understand is where the function that is doing the work of actually reading the file is being executed. Javascript is single-threaded, but there are two things happening at once: the execution of my node.js file and of some program/function actually reading data from the hard drive. Where does this second function take place in relation to node?
The Node event loop is truly single threaded. When we start up a program with Node, a single instance of the event loop is created and placed into one thread.
However for some standard library function calls, the node C++ side and libuv decide to do expensive calculations outside of the event loop entirely. So they will not block the main loop or event loop. Instead they make use of something called a thread pool that thread pool is a series of (by default) four threads that can be used for running computationally intensive tasks. There are ONLY FOUR things that use this thread pool - DNS lookup, fs, crypto and zlib. Everything else execute in the main thread.
"Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code. Compared to the Apache model, there are a lot less threads and thread overhead, since threads aren’t needed for each connection; just when you absolutely positively must have something else running in parallel and even then the management is handled by Node.js." via http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
Its like using, setTimeout(function(){/*file reading code here*/},1000);. JavaScript can run multiple things side by side like, having three setInterval(function(){/*code to execute*/},1000);. So in a way, JavaScript is multi-threading. And for actually reading from/or writing to the hard drive, in NodeJS, if you use:
var child=require("child_process");
function put_text(file,text){
child.exec("echo "+text+">"+file);
}
function get_text(file){
//JQuery code for getting file contents here (i think)
return JQueryResults;
}
These can also be used for reading and writing to/from the hard drive using NodeJS.

Resources