node-postgres 'event emitter style' vs 'callback style' - node.js

node-postgres states the following:
node-postgres supports both an 'event emitter' style API and a 'callback' style. The
callback style is more concise and generally preferred, but the evented API can come in
handy. They can be mixed and matched.
With the event emitter API, I can do the following:
var db = new pg.Client("insert-postgres-connection-info");
db.connect();
And then I can use db to execute queries throughout my web app using db.query('sql statement here'). With the callback style, I would do the following each time I want to run a query:
pg.connect(conString, function(err, client) {
client.query("sql statement", function(err, result) {
// do stuff
});
});
So my question is why is it "generally preferred" to use the callback style? Isn't it inefficient to open a connection each time you do something with the database? What benefits are there from using the callback style?
EDIT
I might be mistaken as to what he means by "callback style" (I'm not kidding, my JavaScript isn't very strong) but my question is about the method of connection. I assumed the following was the callback style connection method:
// Simple, using built-in client pool
var pg = require('pg');
//or native libpq bindings
//var pg = require('pg').native
var conString = "tcp://postgres:1234#localhost/postgres";
//error handling omitted
pg.connect(conString, function(err, client) {
client.query("SELECT NOW() as when", function(err, result) {
console.log("Row count: %d",result.rows.length); // 1
console.log("Current year: %d", result.rows[0].when.getYear());
});
});
and the following is the EventEmitter API connection method:
// Evented api
var pg = require('pg'); //native libpq bindings = `var pg = require('pg').native`
var conString = "tcp://postgres:1234#localhost/postgres";
var client = new pg.Client(conString);
client.connect();
If I'm just getting terms mixed up here, my question still remains. pg.connect(do queries) opens a new connection every time you use it (doesn't it?) whereas
var client = new pg.Client(conString);
client.connect();
opens a connection and then allows you to use client to run queries when necessary, no?

The EventEmitter style is more for this type of thing:
var query = client.query("SELECT * FROM beatles WHERE name = $1", ['John']);
query.on('row', function(row) {
console.log(row);
console.log("Beatle name: %s", row.name); //Beatle name: John
console.log("Beatle birth year: %d", row.birthday.getYear()); //dates are returned as javascript dates
console.log("Beatle height: %d' %d\"", Math.floor(row.height/12), row.height%12); //integers are returned as javascript ints
});
By mixing and matching, you should be able to do the following:
// Connect using EE style
var client = new pg.Client(conString);
client.connect();
// Query using callback style
client.query("SELECT NOW() as when", function(err, result) {
console.log("Row count: %d",result.rows.length); // 1
console.log("Current year: %d", result.rows[0].when.getYear());
});
Note that even when using the callback style, you wouldn't open a connect every time you want to execute a query; most likely, you'd open a connection when the application starts and use it throughout.

There are pros and cons and the one you choose depends on your use case.
Use case 1: Return the result set to the client row-by-row.
If you're going to return data to the client much in the same way it comes out of the database - row by row - then you can use the event emitter style to reduce latency, which I define here as the time between issuing the request and receiving the first row. If you used the callback style instead, latency would be increased.
Use case 2: Return a hierarchical data structure (e.g. JSON) based on the entire result set.
If you're going to return data to the client in a hierarchical data structure such as JSON (which you would do to save bandwidth when the result set is a flat representation of a hierarchy), you should use the callback style because you are can't return anything until you have received all rows. You could use the event emitter style and accumulate rows (node-postgres provides such a mechanism so you don't have to maintain a map of partially built results by query), but it would be a pointless waste of effort because you can't return any results until you have received the last row.
Use case 3: Return an array of hierarchical data structures.
When returning an array of hierarchical data structures, you will have a lot of rows to get through all at once if you use the callback style. This would block for a significant amount of time which isn't good because you have only one thread to service many clients. So you should use the event emitter style with the row accumulator. Your result set should be ordered such that when you detect a change in value of a particular field, you know the current row represents the beginning of a new result to return and everything accumulated so far represents a now complete result which you can convert to your hierarchical form and return to the client.

Related

A persistent multiset that is built up sequentially and processed at once for Node.js

In Node.js I am trying to get the following behaviour: During the runtime of my Express application I accumulate several IDs of objects that need further processing. For further processing, I need to transmit these IDs to a different service. The other service however, cannot handle a lot of requests, but rather requires batch transmission. Hence I need to accumulate a lot of individual requests to a bigger one while allowing persistence.
tl;dr — Over 15 minutes, several IDs are accumulated in my application, then after this 15 minute-window, they're all emitted at once. At the same time, the next windows is opened.
From my investigation, this is probably the abstract data type of a multiset: The items in my multiset (or bag) can have duplicates (hence the multi-), they're packaged by the time window, but have no index.
My infrastructure is already using redis, but I am not sure whether there's a way to accumulate data into one single job. Or is there? Are there any other sensible ways to achieve this kind of behavior?
I might be misunderstanding some of the subtlety of your particular situation, but here goes.
Here is a simple sketch of some code that processes a batch of 10 items at a time. The way you would do this differs slightly depending on whether the processing step is synchronous or asynchronous. You don't need anything more complicated than an array for this, since arrays have constant time push and length methods and that is the only thing you need to do. You may want to add another option to flush the batch after a given item in inserted.
Synchronous example:
var batch = [];
var batchLimit = 10;
var sendItem = function (item) {
batch.push(item);
if (item.length >= batchLimit) {
processBatchSynch(batch);
batch = [];
}
}
Asynchronous example:
// note that in this case the job of emptying the batch array
// has to be done inside the callback.
var batch = [];
var batchLimit = 10;
// your callback might look something like function(err, data) { ... }
var sendItem = function (item, cb) {
batch.push(item);
if (item.length >= batchLimit) {
processBatchAsync(batch, cb);
}
}
I came up with a npm module to solve this specific problem using a MySQL database for persistence:
persistent-bag: This is to bags like redis is to queues. A bag (or multiset) is filled over time and processed at once.
On instantiation of the object, the required table is created in the provided MySQL database if necessary.
var PersistentBag = require('persistent-bag');
var bag = new PersistentBag({
host: 'localhost',
port: '3306',
user: 'root',
password: '',
database: 'test'
});
Then items can be .add()ed during the runtime of any number of applications:
var item = {
title: 'Test item to store to bag'
};
bag.add(item, function (err, itemId) {
console.log('Item id: ' + itemId);
});
Working with the emitted, aggregated items every 15 minutes is done like in kue for redis by subscribing to .process():
bag.process(function worker(bag, done) {
// bag.data is now an array of all items
doSomething(bag.data, function () {
done();
});
});

MongoDB NodeJS driver, how to know when .update() 's are complete

As the code is quite large to posted in here, I append my github repo https://github.com/DiegoGallegos4/Mongo
I am trying to use de NodeJS driver to update some records fulfilling a criteria but first I have to find some records fulfilling another criteria. On the update part, the records found and filter from the find operation are used. This is,
file: weather1.js
MongoClient.connect(some url, function(err,db){
db.collection(collection_name).find({},{},sort criteria).toArray(){
.... find the data and append to an array
.... this data inside a for loop
db.collection(collection_name).update(data[i], {$set...}, callback)
}
})
That´s the structure used to solve the problem, relating when to close the connection , it is when the length of the data array equals the number of callbacks on the update operation. For more details you can refer to the repo.
file: weather.js
On the other approach, Instead of toArray is used .each to iterate on the cursor.
I've looked up for a solution to this for a week now on several forums.
I've read about pooling connections but I want to know what is my conceptual error on my code. I would appreciate a deep insight on this topic.
The way you pose your question is very misleading. All you want to know is "When is the processing complete so I can close?".
The answer to that is you need to respect the callbacks generally only move through the cursor of results once each update is complete.
The simple way without other dependencies is to use the stream interface suported by the driver:
var MongoClient = require('mongodb').MongoClient;
MongoClient.connect('mongodb://localhost:27017/data',function(err,db){
if(err) throw err;
coll = db.collection('weather');
console.log('connection established')
var stream = coll.find().sort([['State',1],['Temperature',-1]])
stream.on('err',function(err) {
throw err;
});
stream.on('end',function() {
db.close();
});
var month_highs = [];
var state = '';
var length = 0;
stream.on('data',function(doc) {
stream.pause(); // pause processing documents
if (err) throw err;
if (doc) {
length = month_highs.length
if(state != doc['State']){
month_highs.push(doc['State']);
//console.log(doc);
}
state = doc['State']
if(month_highs.length > length){
coll.update(doc, {$set : {'month_high':true} }, function(err, updated){
if (err) throw err;
console.log(updated)
stream.resume(); // resume processing documents
});
} else {
stream.resume();
}
} else {
stream.resume();
}
});
});
That's just a copy of the code from your repo, refactored to use a stream. So all the important parts are where the word "stream" appears, and most importantly where they are being called.
In a nutshell the "data" event is emitted by each document from the cursor results. First you call .pause() so new documents do not overrun the processing. Then you do your .update() and within it's callback on return you call .resume(), and the flow continues with the next document.
Eventually "end" is emitted when the cursor is depleted, and that is where you call db.close().
That is basic flow control. For other approaches, look at the node async library as a good helper. But do not loop arrays with no async control, and do not use .each() which is DEPRECATED.
You need to signal when the .update() callback is complete to follow a new "loop iteration" at any rate. This is the basic no additional dependancy approach.
P.S I am a bit suspect about the general logic of your code, especially testing if the length of something is greater when you read it without possibly changing that length. But this is all about how to implement "flow control", and not to fix the logic in your code.

node.js, pg module and done() method

Using the pg module and clients pool I need to call done() method in order to return the client into clients pool.
Once I connect to the server, I add SQL query client’s query queue and I start handling the result asynchronously row by row in row event:
// Execute SQL query
var query = client.query("SELECT * FROM categories");
// Handle every row asynchronously
query.on('row', handleRow );
When I should call done() method?
Should I call it once I receive the end event and all rows are processed or I can call it immediately after I add SQL query to the client’s query queue?
Going from an example on this project's page (https://github.com/brianc/node-pg-query-stream), I'd recommend calling it when you get the end event.
This makes sense, because you're not done with it until you're received the last row. If someone else got that same connection and tried using it, that would likely create odd errors.
The former makes sense: you would want to call it once you know you have processed all rows for your query.
// your DB connection info
var conString = "pg://admin:admin#localhost:5432/Example";
var pg = require("pg");
var client = new pg.Client(conString);
client.connect();
// Your own query
var query = client.query("SELECT * FROM mytable");
query.on("row", function (row, result) {
// do your stuff with each row
result.addRow(row);
});
query.on("end", function (result) {
// here you have the complete result
console.log(JSON.stringify(result.rows, null, 2));
// end when done ;)
client.end();
});

Meteor client synchronous server database calls

I am building an application in Meteor that relies on real time updates from the database. The way Meteor has laid out the examples is to have the database call under the Template call. I've found that when dealing with medium sized datasets this becomes impractical. I am trying to move the request to the server, and have the results passed back to the client.
I have looked at similar questions on SA but have found no immediate answers.
Here is my server side function:
Meteor.methods({
"getTest" : function() {
var res = Data.find({}, { sort : { time : -1 }, limit : 10 });
var r = res.fetch();
return (r);
}
});
And client side:
Template.matches._matches = function() {
var res= {};
Meteor.call("getTest", function (error, result) {
res = result;
});
return res;
}
I have tried variations of the above code - returning in the callback function as one example. As far as I can tell, having a callback makes the function asynchronous, so it cannot be called onload (synchronously) and has to be invoked from the client.
I would like to pass all database queries server side to lighten the front end load. Is this possible in Meteor?
Thanks
The way to do this is to use subscriptions instead of remote method calls. See the counts-by-room example in the docs. So, for every database call you have a collection that exists client-side only. The server then decides the records in the collection using set and unset.

How to write through to a db one by one from evented input that arrives too fast in node.js

I receive input from a MS SQL SELECT query using the tedious driver.
I have attached a listener to the "row" event of the reader:
request.on('row', function(columns) {
insert_row_other_db(columns);
});
I am writing the results to another database in the insert_row_other_db function.
But the rows arrive much faster than they can be written, and I want to open only one connection. What is a good way to de-asyncronyze the writes to the other db? I would like to write the rows one after the other.
Assuming you are able to receive a callback when insert_row_other_db completes, you can use the Async library to create a queue that you can use to schedule your inserts. The queue has a concurrency setting that allows you to limit the number of tasks that can run at once.
var async = require("async");
var insertQueue = async.queue(function (columns, callback) {
insert_row_other_db(columns, callback);
}, 1);
request.on("row", function (columns) {
insertQueue.push(columns);
});

Resources