Perform non-blocking eval reads in MongoDB

Perform non-blocking eval reads in MongoDB - node.js

I figured out how to run javascript code on the MongoDB server, from a node.js client:
db.eval("function(x){ return x*10; }", 1, function (err, retval) {
console.log('err: '+err);
console.log('retval: '+retval);
});
And that works fine. But the docs say that db.eval() issues a write lock, so that nothing else can read or write to the database. I do not want that.
It also says that eval has no such limitation, but I do not know where to find it. From the way they're talking about it, it seems as if regular eval is only available in the mongo shell, and so not from the client side.
So: how can I run these stored procedures on the mongodb server without blocking everything?

you can pass an object with the field nolock set to true as an optional 3rd parameter to eval:
db.eval('function (x) {return x*10; }', [1], {nolock:true}, function(err, retval) {
console.log('err: '+err);
console.log('retval: '+retval);
});
Note that this prevents eval from setting an obligatory write-lock, but it doesn't prevent any operations inside your function from creating write-locks on their own.
Source: the documentation.
Note that the term "stored procedure" is wrong in this case. A stored procedure refers to code which is stored on the database itself and not delivered by the application layer. MongoDB can also do this utilizing the special collection db.system.js, but doing this is discouraged: http://docs.mongodb.org/manual/applications/server-side-javascript/#storing-functions-server-side
By the way: MongoDB wasn't designed for stored procedures. It is usually recommended to implement any advanced logic on the application layer. The practice to implement even trivial operations as stored procedures, like it is sometimes done on SQL databases, is discouraged.

This is this the way to store your functions on the Server Side and you call use it as shown below:
db.system.js.save( { _id : "myAddFunction" , value : function (x,y)
{ return x +y;} } );
db.system.js.find()
{ "_id" : "myAddFunction", "value" : function (x,y){ return x + y; } }
db.eval( "myAddFunction( 1 ,2)" )
3

Related

Use transactions with mariadb and node.js

Is there a better way to use transactions with the mariasql library than adding BEGIN to the start of the query, and finalizing with either a commit or a rollback?
Currently, if I want to wrap a series of queries in a transaction I have to do something like this:
const MariaClient = require('mariasql');
let client = new MariaClient();
client.connect({
host: "127.0.0.1",
user: "user",
password: "pass",
db: "some_db",
multiStatements: true
});
client.query('BEGIN; INSERT INTO some_table VALUES ("a0","b0"), ("a1","b1"), ("a2","b2");', function (err) {
if (err) {
client.query('ROLLBACK;');
}
client.query('COMMIT;');
});
This seems clunky and potentially error prone. We are using the generic-pool to manage the mariadb client, so it seems like there could be some unintended consequences handling transactions this way.

Plan A: If autocommit is set to 1, then each statement is its own transaction. No BEGIN/COMMIT needed.
Plan B: Suck it up and use separate calls to the API for each statement here:
BEGIN;
some SQL statement;
some SQL statement;
some SQL statement;
COMMIT;
If the API has special calls for BEGIN and COMMIT, use them instead of performing the corresponding SQL; there may be something important hiding in the call.
In both cases you must check for errors at all steps. Deadlocks can happen when you least expect them.

how to define functions in redis \ lua?

I'm using Node.js, and the 'redis-scripto' module, and I'm trying to define a function in Lua:
var redis = require("redis");
var redisClient = redis.createClient("6379","127.0.0.1");
var Scripto = require('redis-scripto');
var scriptManager = new Scripto(redisClient);
var scripts = {'add_script':'function add(i,j) return (i+j) end add(i,j)'};
scriptManager.load(scripts);
scriptManager.run('add_script', [], [1,1], function(err, result){
console.log(err || result);
});
so I'm getting this error:
[Error: ERR Error running script (call to .... #enable_strict_lua:7: user_script:1: Script attempted to create global variable 'add']
so I've found that it's a protection, as explained in this thread:
"The doc-string of scriptingEnableGlobalsProtection indicates that intent is to notify script authors of common mistake (not using local)."
but I still didn't understand - where is this scripting.c ? and the solution of changing global tables seems risky to me.
Is there no simple way of setting functions to redis - using lua ?

It looks like the script runner is running your code in a function. So when you create function add(), it thinks you're doing it inside a function by accident. It'll also, likely, have a similar issue with the arguments to the call to add(i,j).
If this is correct then this should work:
local function add(i,j) return (i+j) end return add(1,1)
and if THAT works then hopefully this will too with your arguments passed from the js:
local function add(i,j) return (i+j) end return add(...)

You have to assign an anonymous function to a variable (using local).
local just_return_it = function(value)
return value
end
--// result contains the string "hello world"
local result = just_return_it("hello world")
Alternately, you can create a table and add functions as fields on the table. This allows a more object oriented style.
local obj = {}
function obj.name()
return "my name is Tom"
end
--// result contains "my name is Tom"
local result = obj.name()
Redis requires this to prevent access to the global scope. The entire scripting mechanism is designed to keep persistence out of the scripting layer. Allowing things in the global scope would create an easy way to subvert this, making it much more difficult to track and manage state on the server.
Note: this means you'll have to add the same functions to every script that uses them. They aren't persistent on the server.

This worked for me: (change line 5 from the question above):
var scripts = {
'add_script':'local function add(i,j) return (i+j) end return(add(ARGV[1],ARGV[2]))'};
to pass variables to the script use KEYS[] and ARGV[], as explained in the redis-scripto guide
and also there's a good example in here: "Lua: A Guide for Redis Users"

How to populate mongoose with a large data set

I'm attempting to load a store catalog into MongoDb (2.2.2) using Node.js (0.8.18) and Mongoose (3.5.4) -- all on Windows 7 64bit. The data set contains roughly 12,500 records. Each data record is a JSON string.
My latest attempt looks like this:
var fs = require('fs');
var odir = process.cwd() + '/file_data/output_data/';
var mongoose = require('mongoose');
var Catalog = require('./models').Catalog;
var conn = mongoose.connect('mongodb://127.0.0.1:27017/sc_store');
exports.main = function(callback){
var catalogArray = fs.readFileSync(odir + 'pc-out.json','utf8').split('\n');
var i = 0;
Catalog.remove({}, function(err){
while(i < catalogArray.length){
new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
if(err){
console.log(err);
} else {
i++;
}
});
if(i === catalogArray.length -1) return callback('database populated');
}
});
};
I have had a lot of problems trying to populate the database. Under previous scenarios (and this one), node pegs the processor and eventually runs out of memory. Note that in this scenario, I'm trying to allow Mongoose to save a record, and then iterate to the next record once the record saves.
But the iterator inside of the Mongoose save function never gets incremented. In addition, it never throws any errors. But if I put the iterator (i) outside of the asynchronous call to Mongoose, it will work, provided the number of records that I try to load are not too big (I have successfully loaded 2,000 this way).
So my questions are: Why isn't the iterator inside of the Mongoose save call ever incremented? And, more importantly, what is the best way to load a large data set into MongoDb using Mongoose?
Rob

i is your index to where you're pulling input data from in catalogArray, but you're also trying to use it to keep track of how many have been saved which isn't possible. Try tracking them separately like this:
var i = 0;
var saved = 0;
Catalog.remove({}, function(err){
while(i < catalogArray.length){
new Catalog(JSON.parse(catalogArray[i])).save(function(err, doc){
saved++;
if(err){
console.log(err);
} else {
if(saved === catalogArray.length) {
return callback('database populated');
}
}
});
i++;
}
});
UPDATE
If you want to add tighter flow control to the process, you can use the async module's forEachLimit function to limit the number of outstanding save operations to whatever you specify. For example, to limit it to one outstanding save at a time:
Catalog.remove({}, function(err){
async.forEachLimit(catalogArray, 1, function (catalog, cb) {
new Catalog(JSON.parse(catalog)).save(function (err, doc) {
if (err) {
console.log(err);
}
cb(err);
});
}, function (err) {
callback('database populated');
});
}

Rob,
The short answer:
You created an infinite loop. You're thinking synchronously and with blocking, Javascript functions asynchronously and without blocking. What you are trying to do is like trying to directly turn the feeling of hunger into a sandwich. You can't. The closest thing is you use the feeling of hunger to motivate you to go to the kitchen and make it. Don't try to make Javascript block. It won't work. Now, learn async.forEachLimit. It will work for what you want to do here.
You should probably review asynchronous design patterns and understand what it means on a deeper level. Callbacks are not simply an alternative to return values. They are fundamentally different in how and when they are executed. Here is a good primer: http://cs.brown.edu/courses/csci1680/f12/handouts/async.pdf
The long answer:
There is an underlying problem here, and that is your lack of understanding of what non-blocking IO and asynchronous means. Im not sure if you are breaking into node development, or this is just a one-off project, but if you do plan to continue using node (or any asynchronous language) then it is worth the time to understand the difference between synchronous and asynchronous design patterns, and what motivations there are for them. So, that is why you have a logic error putting the loop invariant increment inside an asynchronous callback which is creating an infinite loop.
In non-computer science, that means that your increment to i will never occur. The reason is because Javascript executes a single block of code to completion before any asynchronous callbacks are called. So in your code, your loop will run over and over, without i ever incrementing. And, in the background, you are storing the same document in mongo over and over. Each iteration of the loop starts sending document with index 0 to mongo, the callback can't fire until your loop ends, and all other code outside the loop runs to completion. So, the callback queues up. But, your loop runs again since i++ is never executed (remember, the callback is queued until your code finishes), inserting record 0 again, queueing another callback to execute AFTER your loop is complete. This goes on and on until your memory is filled with callbacks waiting to inform your infinite loop that document 0 has been inserted millions of times.
In general, there is no way to make Javascript block without doing something really really bad. For example, something paramount to setting your kitchen on fire to fry some eggs for that sandwich I talked about in the "short answer".
My advice is to take advantage of libs like async. https://github.com/caolan/async JohnnyHK mentioned it here, and he was correct for doing so.

Preventing database-related race conditions in Node.js

Overview
I am attempting to understand how to ensure aynchronous safety when using an instance of a model when using Node.js. Here, I use the Mongoose ODM in code samples, but the question applies to any case where a database is used with the asynchronous event-driven I/O approach that Node.js employs.
Consider the following code (which uses Mongoose for MongoDB queries):
Snippet A
MyModel.findOne( { _id : <id #1> }, function( err, doc ) {
MyOtherModel.findOne( { _id : someOtherId }, ( function(err, otherDoc ) {
if (doc.field1 === otherDoc.otherField) {
doc.field2 = 0; // assign some new value to a field on the model
}
doc.save( function() { console.log( 'success' ); }
});
});
In a separate part of the application, the document described by MyModel could be updated. Consider the following code:
Snippet B
MyModel.update( { _id : <id #1> }, { $set : { field1 : someValue }, callback );
In Snippet A, a MongoDB query is issued with a registered callback to be fired once the document is ready. An instance of the document described by MyModel is retained in memory (in the "doc" object). The following sequence could occur:
Snippet A executes
A query is initiated for MyModel, registering a callback (callback A) for later use
<< The Node event loop runs >>
MyModel is retrieved from the database, executing the registered callback (callback A)
A query is initiated for MyOtherModel, registering a callback for later use (callback B)
<< The Node event loop runs >>
Snippet B executes
The document (id #1) is updated
<< The Node event loop runs >>
MyOtherModel is retrieved from the database, executing the registered callback (callback B)
The stale version of the document (id #1) is incorrectly used in a comparison.
Questions
Are there any guarantees that this type of race condition won't happen in Node.js/MongoDB?
What can I do to deterministically prevent this scenario from happening?
While Node runs code in a single-threaded manner, it seems to me that any allowance of the event loop to run opens the door for potentially stale data. Please correct me if this observance is wrong.

No, there are no guarantees that this type of race condition won't occur in node.js/MongoDB. It doesn't have anything to do with node.js though, and this is possible with any database that supports concurrent access, not just MongoDB.
The problem is, however, trickier to solve with MongoDB because it doesn't support transactions like your typical SQL database would. So you have to solve it in your application layer using a strategy like the one outlined in the MongoDB cookbook here.

Redis: How to check if exists in while loop

I'm using Redis in my application and one thing is not clear for me. I save an object with a random generated string as its key. However I would like to check if that key exists. I am planning to use while loop however I am not sure how would I struct it according to Redis. Since if I would like to check for once, I would do;
redisClient.get("xPQ", function(err,result){
if(result==null)
exists = false
});
But I would like use the while loop as;
while(exists == false)
However I cannot build the code structure in my head. Would the while be inside the function or outside the function?

In general, you shouldn't check for existence of a key on the client side. It leads to race conditions. For example, another thread could insert the key after the first thread checked for its presence.
You should use the commands ending with NX. For example - SETNX and HSETNX. These will insert the key only if doesn't already exist. It is guaranteed to be atomic.

I do not understand why you need to implement active polling to check whether a key exists (there are much better ways to handle this kind of situations), but I will try to answer the question.
You should not use a while loop at all (inside or outside the function). Because of the asynchronous nature of node.js, these loops are better implemented using tail recursion. Here is an example:
var redis = require('redis')
var rc = redis.createClient(6379, 'localhost');
function wait_for_key( key, callback ) {
rc.get( key, function(err,result) {
if ( result == null ) {
console.log( "waiting ..." )
setTimeout( function() {
wait_for_key(key,callback);
}, 100 );
} else {
callback(key,result);
}
});
}
wait_for_key( "xPQ", function(key,value) {
console.log( key+" exists and its value is: "+value )
});
There are multiple ways to simplify these expressions using dedicated libraries (using continuation passing style, or fibers). For instance you may want to check the whilst and until functions of the async.js package.
https://github.com/caolan/async

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string