I need to implement batch job to migrate datafrom one mongodb collection to another.
I want to make it in async way. Please look at the following example
private void runBatch() {
mongo.count("clients", new JsonObject(), countResult -> {
if (countResult.succeeded()) {
int count = countResult.result().intValue();
for (int i = 0; i < count; i+=200) {
final int skip = i;
final int limit = skip + 199;
// create new promise to process every 200 clints
vertx.executeBlocking(promise -> {
FindOptions options = new FindOptions();
options.setSkip(skip);
options.setLimit(limit);
options.setSort(new JsonObject().put("_id", 1));
mongo.findWithOptions("clients", query, options, findResult -> {
List<JsonObject> clients = findResult.result();
for (JsonObject client : clients) {
mongo.save("best_clients", client, saveResul -> {
// TODO add logging
});
}
});
}, result -> {
// TODO add logging
});
}
}
});
}
Inside executeBlocking I'm running a lot of async methods
I'm really confise Should I run somewhere promise.complete()?
Maybe My approach is incorrect and I shouldn't work inside promise with 200 object either pass one object for one promise?
A promise is usually used to represent the eventual completion or failure of an asynchronous operation; then, based on the promise's successful completion or failure, some operation is being carried out. (logging, acknowledging that some operation was carried out etc)
It beats the purpose of using a promise if it's not completed or its value never used.
Also remember that the value of a promise, if not completed, could be set (possibly by another thread depending on code implementation) which could lead to unexpected behavior. So it's recommended to always complete promises.
Related
I know the sqlite (not sqlite3) module does requests in batches. I need to grab a value out of a database, assign it, then do some processing with it. However the function that fetches the value from the database isn't returning until after all the processing has taken place.
I need one of 3 things to happen:
The function to return right away
Halt the code until myEvent.id has been assigned
Make sql.get return right away
myEvent.id = generateEventID();
///do stuff with myEvent.id
function generateEventID() {
return sql.get('SELECT * FROM settings WHERE name = "eventID"').then(row => {
if (!row) return message.reply("a database error occured while generating an ID");
currentID = row.intValue + 1;
console.log("New eventID created: " + currentID);
sql.run(`UPDATE settings SET intValue = ${row.intValue + 1} WHERE name = "eventID"`);
return currentID;
});
}
Unfortunately you can't return an Id straight away from the function because sql.get(...) is asynchronous.
One possible way to get around this would be to pass in a callback function as a parameter to generateEventID() and then make the caller to wait for the callback function to be invoked before progressing further.
Therefore your would look like this.
var doStuffWithEventId = function(eventId) {
// Do more stuff.
}
generateEventID(doStuffWithEventId);
function generateEventID(callback) {
return sql.get('SELECT * FROM settings WHERE name = "eventID"').then(row => {
if (!row) return message.reply("a database error occured while generating an ID");
currentID = row.intValue + 1;
console.log("New eventID created: " + currentID);
sql.run(`UPDATE settings SET intValue = ${row.intValue + 1} WHERE name = "eventID"`);
return callback(currentID); // Call function doStuffWithEventId with value of currentID variable
});
}
This is one way of accomplishing what you're trying to do.
Side note here.
You generally would wait for the update operation to go through before returning anything.
You are anticipating that the Id would always be an incremental of 1. Databases don't always do that. So you can get caught off guard.
Your code assumes that the database operations are going to be always successful. You typically would want to handle potential exception that may raised when making external calls.
As you already said, your code should provide three main functionalities in a particular order:
Obtain a result from your data storage (SQL database in this case).
Perform a particular operation over the data obtained (referred as
//do more stuffin your sniippet)
Perform an UPDATE query over your data storage.
Therefore, the implementation logic you provide is wrong as it fails to perform the three operations in that particular order. Don't forget that all Node calls are asynchronous by default and, in this case, you are failing to enforce the synchronization you need. To fix this, you should perform the following steps:
Obtain a result from your data storage. (Already done).
Perform a particular operation over the data obtained. This must be done inside the sql.get() callback or once the promise has resolved (since you are using promises in this case).
The method that performs operation over your data (i.e step 2) should return a new promise. Once this promise is resolved you will be able to call your UPDATE operation and in this way you will ensure that your implementation will meet all your requirements.
Now, this is how I would do it:
generateEventID();
function doMoreStuff(eventId){
return new Promise(function(resolve, reject){
//do your stuff
if(err){
reject(err)
}else{
resolve()
}
})
}
function generateEventID() {
return sql.get('SELECT * FROM settings WHERE name = "eventID"').then(row => {
if (!row) return message.reply("a database error occured while generating an ID");
currentID = row.intValue + 1;
console.log("New eventID created: " + currentID);
doMoreStuff(currentID).then(function(){ //At this point your operations with currentId will have finished already
sql.run(`UPDATE settings SET intValue = ${row.intValue + 1} WHERE name = "eventID"`);
});
});
}
I really hope this helps!
Edit: Don't forget to implement the error handling for promises with .catch()
var assert = require('assert');
var parseJSON = require('json-parse-async');
var contact = new Object();
contact.firstname = "Jesper";
contact.surname = "Aaberg";
contact.phone = ["555-0100", "555-0120"];
var contact2 = new Object();
contact2.firstname = "JESPER";
contact2.surname = "AABERG";
contact2.phone = ["555-0100", "555-0120"];
contact.toJSON = function(key) {
var replacement = new Object();
for (var val in this) {
if (typeof(this[val]) === 'string')
replacement[val] = this[val].toUpperCase();
else
replacement[val] = this[val]
}
return replacement;
};
var jsonText = JSON.stringify(contact);
contact = JSON.parse(jsonText);
console.log(contact);
console.log(contact2);
assert.deepEqual(contact, contact2, 'these two objects are the same');
What are the asynchronous equivalent functions of JSON.parse, JSON.stringify and assert.deepEqual? I am trying to create a race condition and non-deterministic behavior within the following code but I have not been able lto find non-blocking, asynchronous equivalents of the functions mentioned above.
node.js does not have an actual asynchronous JSON parser built-in. If you want something that will actually do the parsing outside the main node.js Javascript thread, then you would have to find a third party module that parses the JSON outside of the Javascript thread (e.g. in a native code thread or in some other process). There are some modules in NPM that advertise themselves as asynchronous such as async-json-parser or async-json-parse or json-parse-async. You would have to verify that whichever implementation you were interested in was truly an asynchronous implementation (your Javascript continues to run while the parsing happens in the background).
But, reading the detail in your question about the problem you're trying to solve, it doesn't sound like you actually need a parser that truly happens in the background. To give you your ability to test what you're trying to test, it seems to me like you just need an indeterminate finish that allows other code to run before the parsing finishes. That can be done by wrapping the synchronous JSON.parse() in a setTimeout() with a promise that has a random delay. That will give some random amount of time for other code to run (to try to test for your race conditions). That could be done like this:
JSON.parseAsyncRandom = function(str) {
return new Promise(function(resolve, reject) {
// use a random 0-10 second delay
setTimeout(function() {
try {
resolve(JSON.parse(str));
} catch(e) {
reject(e);
}
}, Math.floor(Math.random() * 10000));
});
}
JSON.parseAsyncRandom(str).then(function(obj) {
// process obj here
}, function(err) {
// handle err here
});
Note: This is not true asynchronous execution. It's an asynchronous result (in that it arrives some random time later and other code will run before the result arrives), but true asynchronous execution happens in the background in parallel with other JS running and this isn't quite that. But, given your comment that you just want variable and asynchronous results for testing purposes, this should do that.
I've recently faced this problem myself, so I decided to create a library to handle JSON parsing in a really asynchronous way.
The idea behind it is to divide the parsing process into chunks, and then run each separately in the event loop so that other events (user interactions, etc) can still be evaluated within a few milliseconds, keeping the UI interactive.
If you are interested, the library it's called RAJI and you can find it here: https://github.com/federico-terzi/raji
After installing RAJI, you can then convert your synchronous JSON.parse calls into async raji.parse calls, such as:
const object = await parse(payload);
These calls won't block the UI
You can use 'bluebird', like this example to convert calling function to promise.
I write code below using javascript es6.
const Promise = require('bluebird')
function stringifyPromise(jsonText) {
return Promise.try(() => JSON.stringify(jsonText))
}
function parsePromise(str) {
return Promise.try(() => JSON.parse(str))
}
stringifyPromise(contact)
.then(jsonText => parsePromise(jsonText))
.then(contact => {
assert.deepEqual(contact, contact2, 'these two objects are the same')
})
})
I have an orientdb database. I want to use nodejs with RESTfull calls to create a large number of records. I need to get the #rid of each for some later processing.
My psuedo code is:
for each record
write.to.db(record)
when the async of write.to.db() finishes
process based on #rid
carryon()
I have landed in serious callback hell from this. The version that was closest used a tail recursion in the .then function to write the next record to the db. However, I couldn't carry on with the rest of the processing.
A final constraint is that I am behind a corporate proxy and cannot use any other packages without going through the network administrator, so using the native nodejs packages is essential.
Any suggestions?
With a completion callback, the general design pattern for this type of problem makes use of a local function for doing each write:
var records = ....; // array of records to write
var index = 0;
function writeNext(r) {
write.to.db(r, function(err) {
if (err) {
// error handling
} else {
++index;
if (index < records.length) {
writeOne(records[index]);
}
}
});
}
writeNext(records[0]);
The key here is that you can't use synchronous iterators like .forEach() because they won't iterate one at a time and wait for completion. Instead, you do your own iteration.
If your write function returns a promise, you can use the .reduce() pattern that is common for iterating an array.
var records = ...; // some array of records to write
records.reduce(function(p, r) {
return p.then(function() {
return write.to.db(r);
});
}, Promsise.resolve()).then(function() {
// all done here
}, function(err) {
// error here
});
This solution chains promises together, waiting for each one to resolve before executing the next save.
It's kinda hard to tell which function would be best for your scenario w/o more detail, but I almost always use asyncjs for this kind of thing.
From what you say, one way to do it would be with async.map:
var recordsToCreate = [...];
function functionThatCallsTheApi(record, cb){
// do the api call, then call cb(null, rid)
}
async.map(recordsToCreate, functionThatCallsTheApi, function(err, results){
// here, err will be if anything failed in any function
// results will be an array of the rids
});
You can also check out other ones to enable throttling, which is probablya good idea.
How do I ensure that SQLite doesn't interleave queries from multiple concurrent node.js/Express HTTP requests into a single transaction?
I would want DB queries from different requests to be executed in separate transactions, isolated from each other, allowing each to be committed or rolled back independently of each other. Given node.js's single-thread (and single-DB-connection?) characteristics this seems particularly problematic.
I've been trawling through dozens of web pages and docs and haven't found any clear (to me) explanation of how (or whether) this scenario is handled. For instance the description of db.serialize() only states that one query will be executed at a time - it says nothing about the separation of queries belonging to different transactions.
Any pointers appreciated!
I faced the exact same issue. Here is how I solved it:
Step 1: Dnsure all database access is done via a set of library functions, where the database is passed in as the first input parameter.
Step 2: When you create the database object (using new sqlite3.Database('path')), wrap the object with some extra meta-information:
Here is how I wrapped mine:
var wrap = { db_obj : new sqlite3.Database('path'), locked : "UNLOCKED", work_queue : [] }
Step 3: Create a function that will be used to run db functions. If the database is locked, this function will defer their execution until later. I called mine "runAsync":
function runAsync( db, fn, callBack) {
if (db.locked === "LOCKED" ) {
db.work_queue.push( function() { runAsync( db, fn, callBack ); });
return;
}
fn(db,callBack);
}
We can see that this function checks the state of the wrapped db object. If it is locked, we defer execution by storing the function in the "work queue" to be executed later (by doing fn()).
Step 4: Create a function that will be used to run db functions exclusively. If the database is locked, this function will defer execution until later, else it will lock the database, run its function and then unlock the database. I called mine: runAsyncExclusive():
function runAsyncExclusive( db, fn, callBack ) {
if(db.locked === "LOCKED") {
db.work_queue.push( function() { runAsyncExclusive( db, fn, callBack) });
return;
}
var exclusiveDb = {
db_obj : db.db_obj,
locked : "UNLOCKED",
work_queue : []
};
db.locked = "LOCKED";
fn(exclusiveDb, function(err,res) {
db.locked = "UNLOCKED";
var workItems = db.work_queue;
_.each(workItems, function(fn) { fn(); });
db.work_queue = [];
callBack(err,res);
})
}
The exclusiveDb object that is passed into the function will allow exclusive access to the db. This object can itself be locked, allowing for arbitrarily deep nested locking.
Step 5: Adjust your library functions such that they call asyncRun() and asyncRunExclusive() where appropriate:
function get(db,sql,params,callBack) {
db.get(sql,params,callBack);
}
Becomes...
function get(db,sql,params,callBack) {
runAsync( db, function(db,cb) { db.get(sql,params,cb); }, callBack );
}
Voila!
(Apologies if this wasn't clear enough)
I am creating an array of JSON objects which is then stored in mongodb.
Each JSON object contains a number of fields - each being populated before I save the object to mongodb.
Some of the Objects attributes are populated by making API calls to other websites such as last.fm but the returned value is not quick enough to populate the attribute before the object is saved to mongodb.
How can I wait for all attributes of an object to be populated before saving it? I did try async.waterfall but it still falls through without waiting and I end up with a database filled with documents with empty fields..
Any help would be greatly appreciated.
Thanks :)
You have a few options for controlling asynchrony in JavaScript:
Callback pattern: (http://npmjs.org/async) async.all([...], function (err) {
Promises: (http://npmjs.org/q) Q.all([...]).then(function () {
Streams: (http://npmjs.org/concat-stream) see also https://github.com/substack/stream-handbook
Since you say you are making multiple API calls to other websites, you may want to try:
async.each(api_requests,
function(api_request, cb) {
request(api_request, function (error, response, body) {
/* code */
/* add to model for Mongo */
cb();
});
},
function(err) {
// continue execution after all cbs are received
/* code */
/* save to Mongo, etc.. */
}
);
The above example is most applicable when you are making numerous requests following the same format. Please review the documentation for Waterfall (https://github.com/caolan/async#waterfall) if the input into your next step depends on the output of the previous step or Parallel (https://github.com/caolan/async#parallel) if you have a bunch of unrelated tasks that don't rely on each other. The great thing about async is that you can nest and string all the functions together to support what you're trying to do.
You'll either want to use promises or some sort of callback mechanism. Here's an example of the promise method with jPromise:
var jPromise = require('jPromise');
var promises = [];
for(var i = 0; i < 10; i++) {
promises.push(someAsyncApiCall(i));
}
jPromise.when(promises).then(function() {
saveThingsToTheDb();
});
Similarly, without the promise library:
var finished = 0;
var toDo = 10;
function allDone() {
saveThingsToTheDb();
}
for(var i = 0; i < toDo.length; i++) {
someAsyncApiCall(function() {
finished++;
if(finished === toDo) {
allDone();
}
});
}
Personally, I prefer the promise method, but that will only that well if the API you're calling returns some sort of a promise. If it doesn't, you'll be SOL and wrap the callback API with promises somehow (Q does this pretty well).