Are Database connections asynchronous in Node? - node.js

Node is single-threaded, but there are a lot of functions(modules like http, fs) that allow us to do a background task and the event loop takes care of executing the callbacks.
However, is this true for a database connection?
Let's say I have the following code.
const mysql = require('mysql');
function callDatabase(id) {
var result;
var connection = mysql.createConnection(
{
host : '192.168.1.14',
user : 'root',
password : '',
database : 'test'
}
);
connection.connect();
var queryString = 'SELECT name FROM test WHERE id = 1';
connection.query(queryString, function(err, rows, fields) {
if (err) throw err;
for (var i in rows) {
result = rows[i].name;
}
connection.end();
return result;
});
}
Does, mysql.createConnection, connection.connect, connection.query, connection.end spin up a new thread to execute in the background, leaving Node to run the remaining synchronous code?
If yes, in what queue will the callback be enqueued and how to write this sort of code such that a background task is initiated.

Anything that may be blocking (file system operations, network connections, etc) are generally asynchronous in Node, in order to avoid blocking on the main thread. That these functions take a parameter for a callback function is a sure hint that you have asynchronous operations (or "background tasks") in progress.
You don't show it in your sample code, but connect() and end() do take callback functions so you know when a connection is actually made or ends. It looks like the mysql library, however, also maintains an internal queue to make sure you can't attempt a query until a connection has been made and that only one operation at a time can be executed.
Note that createConnection() does not have a callback function. All it does is create a new data structure (connection) that gets used. It doesn't do any I/O itself, so doesn't need to run asynchronously.
Also note that you don't generally "spin up" your own threads. Node takes care of this thread management for you (largely by running things on the main worker thread), for the most part, and hides how threads themselves work for most developers. You typically hear that Node is "single threaded", and you should treat it this way.
Modern Node code makes extensive use of async/await and Promises to do this sort of thing. Slightly older code uses callback functions. Even older code uses Node events. In reality - if you dig far enough down, they're all using events and possibly presenting the simplified (more modern) interfaces.
The mysql module appears to date from the "callback" era and hasn't yet been updated for Promises/async/await. Under the covers, as noted, it uses Node events to track network (or unix domain socket) connections and transfers.

Related

node.js: How to lock/synchronize a block of code?

Let's take the simple code snippet:
var express = require('express');
var app = express();
var counter = 0;
app.get('/', function (req, res) {
// LOCK
counter++;
// UNLOCK
res.send('hello world')
})
Let's say that app.get(...) is called a huge number of times, and as you can understand I don't want the line counter++ to be executed concurrently by the two different threads.
Therefore, I want to lock this line that only one thread can have access to this line. My question is how to do it in node.js?
I know there is a lock package: https://www.npmjs.com/package/locks, but I'm wondering whether there is a "native" way of doing it without an external library.
I don't want the line counter++ to be executed concurrently by the two different threads
That cannot happen in node.js with just regular Javascript coding.
node.js is single threaded and event-driven, so there's only ever one piece of Javascript code running at a time that can access that variable. You do not have to worry about the typical pre-emptive concurrency issues of multi-threaded systems.
That said, you can still have concurrency issues in node.js if you are using asynchronous code because the node.js asynchronous model returns control back to the system to process the next event and the asynchronous callback gets called on some future event. But, the concurrency issues are non-pre-emptive so you fully control when they can occur.
If you show us your actual code in your app.get() route handler, then we can advise more specifically about whether you do or don't have a concurrency issue there or not. And, if you do, we can advise on how to best deal with that.
Threads in the thread pool are all native code that runs behind the scenes. They only trigger actual Javascript to run by queuing events through the event queue. So, because all Javascript that runs is serialized through the event queue, you only get one piece of Javascript ever running at a time. The basic scheme of the event queue is that the interpreter runs a piece of Javascript until it returns control back to the system. At that point, the interpreter looks in the event queue and if there's an event waiting, it pulls that event out and calls the callback associated with that event. Meanwhile, if there is native code running in the background, when it completes, it adds an event to the event queue. That event is not processed until the current Javascript returns control back to the system and it can then grab the next event out of the event queue. So, it's this event-queue that serializes running only one piece of Javascript at a time.
Edit: Nodejs does now have WorkerThreads which enable separate threads of Javascript, but each thread has its own heap and its own variables so a variable from one thread cannot be directly accessed from another thread. You can configure shared memory that both WorkerThreads can access, but that isn't straight variables, but blocks of memory and if you want to use shared memory, then you do indeed need to code your own synchronization methods to make sure you are atomically accessing the variable. The code you show in your question is not using any of this so the access to the counter variable is already atomic and cannot be simultaneously accessed by any other Javascript, even if you are using WorkerThreads.
If you block thread none of the requests will execute all will be in the queue.
It 's not good practice to block the thread in Node.js
var express = require('express');
var app = express();
var counter = 0;
const getPromise = () => {
return new Promise((resolve) => {
setTimeout(() => {
resolve('Done')
}, 100);
});
}
app.get('/', async (req, res) => {
const localCounter = counter++;
// Use local counter for rest of operation so value won't vary
// LOCK: Use promise/callback
await getPromise(); // Not locked but waiting for getPromise to finish
console.log(localCounter); // Same value before lock
res.send('hello world')
})
Node.js is single-threaded, which means that any single process running your app will not have data races like you anticipate. In fact, a quick inspection of the locks library shows that they use a boolean flag and a system of Array objects to determine whether something is locked or not.
You should only really worry about this if you plan on sharing data with multiple processes. In that case, you could use Alan's lockfile approach from this stackoverflow thread here.

Non blocking Loop in Node.js and pooling?

I'm starting to play around with node.js and I have an application which basically iterating over dozens thousands of object and performing some various asynchronous http requests for all of them and populate the object with various data returned from the http requests..
This question is more concerning best practices with Node.js, non blocking operations and probably related to pooling.
Forgive me If I'm using the wrong term, as I'm new to this and please don't hesitate to correct me.
So below is a brief summary of the code
I have got a loop which kind of doing iterating over thousands
//Loop briefly summarized
for (var i = 0; i < arrayOfObjects.length; i++) {
do_something(arrayOfObjects[i], function (error, result){
if(err){
//various log
}else{
console.log(result);
}
});
}
//dosomething briefly summarized
function do_something (Object, callback){
http.request(url1, function(err, result){
if(!err){
insert_in_db(result.value1, function (error,result){
//Another http request with asynchronous
});
}else{
//various logging error
}
});
http.request(url2, function(err, result){
//some various logic including db call
});
}
In reality in do_something there is a complex logic but it's not really the matter right now
So my problem are the following
I think the main issue is in my loop is not really optimized because it's kind of a blocking event.
So the first http request results within dosomething are avaialble are after the loops is finished processing and then it's cascading.
If there a way somehow to make kind of pool of 10 or 20 max simualtenous execution of do_something and the rest are kind of queued when a pool ressource is available?
I hope I clearly explained myself , don't hesitate to ask me if I need to details.
Thanks in advance for your feedbacks,
Anselme
Your loop isn't blocking, per se, but it's not optimal. One of the things it does is schedules arrayOfObjects.length http requests. Those requests will all be scheduled right away, as your loop progresses.
In older versions of node.js, you would have had the benefit of default of 5 concurrent requests per host, but that default is later changed.
But then the actual opening of sockets, sending requests, waiting for responses, this will be individual for each loop. And each entry will finish in it's own time (depending, in this case, on the remote host, or e.g. database response times etc).
Take a look at async, vasync, or some of it's many alternatives, as suggested in comments, for pooling.
You can take it even a step further and use something like Bluebird Promise.map, with concurrency option set, depending on your use case.

nodejs - When to use Asynchronous instead of Synchronous function?

I've read some articles about this stuff. However, I still get stuck in a point. For example, I have two function:
function getDataSync(){
var data = db.query("some query");
return JSON.stringify(data);
}
function getDataAsync(){
return db.query("some query",function(result){
return JSON.stringify(result);
});
}
People said that asynchronous programming is recommended in IO bound. However, I can't see anything different in this case. The Async one seems to be more ugly.
What's wrong with my point?
nodejs is asynchronous by default which mean that it won't execute your statement in order like in other language for example
database.query("SELECT * FROM hugetable", function(rows) {
var result = rows;
});
console.log("Hello World");
In other language, it will wait until the query statement finish execution.
But in nodejs, it will execute the query statement separately and continue execute to log Hello World to the screen.
so when you say
function getDataSync(){
var data = db.query("some query");
return JSON.stringify(data);
}
it will return data before db.query return data
function getDataAsync(){
return db.query("some query",function(result){
return JSON.stringify(result);
});
}
but in node.js way the function that pass as parameter is called callback which mean it will call whenever the getDataAsync() finish its execution
We use callback in nodejs because we don't know when db.query() finishes its execution (as they don't finish execution in order) but when it finishes it will call the callback.
In your first example, the thread will get blocked at this point, until the data is retrieved from the db
db.query("some query");
In the second example, the thread will not get blocked but it will be available to serve more requests.
{
return JSON.stringify(result);
}
This function will be called as soon as the data is available from the db.
That's why node.js is called non-blocking IO which means your thread is never blocked.
Asynchronous is used when some operation blocks the execution. It is not an problem in Multi thread application or server. But NodeJS is single threaded application which should not blocked by single operation. If it is blocked by operation like db.query("some query");, Node.js will wait to finish it.
It is similar to you just stands idle in front of rice cooker until it is cooked. Generally, you will do other activities while rice is cooking. When whistle blows, you can do anything with cooked rice. Similarly NodeJS will sends the asychronous operation to event loop which will intimate us when operation is over. Meanwhile Node.js can do processing other operation like serving other connections.
You said Ugly. Asynchronous does not mean to use only callbacks. You can use Promise, co routine or library async.

Why does Node.js have both async & sync version of fs methods?

In Node.js, I can do almost any async operation one of two ways:
var file = fs.readFileSync('file.html')
or...
var file
fs.readFile('file.html', function (err, data) {
if (err) throw err
console.log(data)
})
Is the only benefit of the async one custom error handling? Or is there really a reason to have the file read operation non-blocking?
These exist mostly because node itself needs them to load your program's modules from disk when your program starts. More broadly, it is typical to do a bunch a synchronous setup IO when a service is initially started but prior to accepting network connections. Once the program is ready to go (has it's TLS cert loaded, config file has been read, etc), then a network socket is bound and at that point everything is async from then on.
Asynchronous calls allow for the branching of execution chains and the passing of results through that execution chain. This has many advantages.
For one, the program can execute two or more calls at the same time, and do work on the results as they complete, not necessarily in the order they were first called.
For example if you have a program waiting on two events:
var file1;
var file2;
//Let's say this takes 2 seconds
fs.readFile('bigfile1.jpg', function (err, data) {
if (err) throw err;
file1 = data;
console.log("FILE1 Done");
});
//let's say this takes 1 second.
fs.readFile('bigfile2.jpg', function (err, data) {
if (err) throw err;
file2 = data;
console.log("FILE2 Done");
});
console.log("DO SOMETHING ELSE");
In the case above, bigfile2.jpg will return first and something will be logged after only 1 second. So your output timeline might be something like:
#0:00: DO SOMETHING ELSE
#1:00: FILE2 Done
#2:00: FILE1 Done
Notice above that the log to "DO SOMETHING ELSE" was logged right away. And File2 executed first after only 1 second... and at 2 seconds File1 is done. Everything was done within a total of 2 seconds though the callBack order was unpredictable.
Whereas doing it synchronously it would look like:
file1 = fs.readFileSync('bigfile1.jpg');
console.log("FILE1 Done");
file2 = fs.readFileSync('bigfile2.jpg');
console.log("FILE2 Done");
console.log("DO SOMETHING ELSE");
And the output might look like:
#2:00: FILE1 Done
#3:00: FILE2 Done
#3:00 DO SOMETHING ELSE
Notice it takes a total of 3 seconds to execute, but the order is how you called it.
Doing it synchronously typically takes longer for everything to finish (especially for external processes like filesystem reads, writes or database requests) because you are waiting for one thing to complete before moving onto the next. Sometimes you want this, but usually you don't. It can be easier to program synchronously sometimes though, since you can do things reliably in a particular order (usually).
Executing filesystem methods asynchronously however, your application can continue executing other non-filesystem related tasks without waiting for the filesystem processes to complete. So in general you can continue executing other work while the system waits for asynchronous operations to complete. This is why you find database queries, filesystem and communication requests to generally be handled using asynchronous methods (usually). They basically allow other work to be done while waiting for other (I/O and off-system) operations to complete.
When you get into more advanced asynchronous method chaining you can do some really powerful things like creating scopes (using closures and the like) with a little amount of code and also create responders to certain event loops.
Sorry for the long answer. There are many reasons why you have the option to do things synchronously or not, but hopefully this will help you decide whether either method is best for you.
The benefit of the asynchronous version is that you can do other stuff while you wait for the IO to complete.
fs.readFile('file.html', function (err, data) {
if (err) throw err;
console.log(data);
});
// Do a bunch more stuff here.
// All this code will execute before the callback to readFile,
// but the IO will be happening concurrently. :)
You want to use the async version when you are writing event-driven code where responding to requests quickly is paramount. The canonical example for Node is writing a web server. Let's say you have a user making a request which is such that the server has to perform a bunch of IO. If this IO is performed synchronously, the server will block. It will not answer any other requests until it has finished serving this request. From the perspective of the users, performance will seem terrible. So in a case like this, you want to use the asynchronous versions of the calls so that Node can continue processing requests.
The sync version of the IO calls is there because Node is not used only for writing event-driven code. For instance, if you are writing a utility which reads a file, modifies it, and writes it back to disk as part of a batch operation, like a command line tool, using the synchronous version of the IO operations can make your code easier to follow.

Asynchronous calls using postgres as an example in NodeJS

When implementing this code (example taken directly from https://github.com/brianc/node-postgres):
var pg = require('pg');
var conString = "tcp://postgres:1234#localhost/postgres";
pg.connect(conString, function(err, client) {
client.query("SELECT NOW() as when", function(err, result) {
console.log("Row count: %d",result.rows.length); // 1
console.log("Current year: %d", result.rows[0].when.getFullYear());
//Code halts here
});
});
After the last console.log, node hangs. I think this is because the asynchronous nature, and I suspect at this point, one should call a callback function.
I have two questions:
Is my thinking correct?
If my thinking is correct, then how does the mechanics work. I know NodeJS is using an event loop, but what is making this event loop halt at this point?
It appears to hang because the connection to Postgres is still open. Until it's closed, or "ended"...
client.end(); // Code halts here
Node will continue to wait in idle for another event to be added to the queue.
Not quite. This is a detail of node-postgres and its dependencies, not of Node or of its "asynchronous nature" in general.
The idling is due to and documented for the generic-pool module that node-postgres uses:
If you are shutting down a long-lived process, you may notice that node fails to exit for 30 seconds or so. This is a side effect of the idleTimeoutMillis behavior -- the pool has a setTimeout() call registered that is in the event loop queue, so node won't terminate until all resources have timed out, and the pool stops trying to manage them.
And, as it explains under Draining:
If you know would like to terminate all the resources in your pool before their timeouts have been reached, you can use destroyAllNow() in conjunction with drain():
pool.drain(function() {
pool.destroyAllNow();
});
One side-effect of calling drain() is that subsequent calls to acquire() will throw an Error.
Which is what pg.end() does and can certainly be done if your intention is to exit at the end of a serial application, such as unit testing or your given snippet.

Resources