How to sketch out an event-driven system?

How to sketch out an event-driven system? - node.js

I'm trying to design a system in Node.js (an attempt at solving one of my earlier problems, using Node's concurrency) but I'm running into trouble figuring out how to draw a plan of how the thing should operate.
I'm getting very tripped up thinking in terms of callbacks instead of returned values. The flow isn't linear, and it's really boggling my ability to draft it. How does one draw out an operational flow for an event-driven system?
I need something I can look at and say "Ok, yes, that's how it will work. I'll start it over here, and it will give me back these results over here."
Pictures would be very helpful for this one. Thanks.
Edit: I'm looking for something more granular than UML, specifically, something that will help me transition from a blocking and object-oriented programming structure, where I'm comfortable, to a non-blocking and event driven structure, where I'm unfamiliar.

Based on http://i.stack.imgur.com/9DDQP.png what you need is a good flow library that allows you to pipeline sync and async calls in node.
One such library is https://github.com/isaacs/slide-flow-control (look at the slide preso there too) and the code outline for what you need to do is below.
It is self documenting and as you see it is quite concise, pure nodejs, uml, img's etc. not required.
var chain = require("slide/chain")
, asyncMap = require("slide/async-map")
;
// start processing
main_loop(function() {
console.log("its done"); // when finished
});
function main_loop(cb) {
var res = [];
// each entry in chain below fires sequentially i.e. after
// the previous function completes
chain
( [ [start_update_q, "user-foo"]
, [get_followed_users, chain.last]
, [get_favorites, chain.last]
, [calc_new_q]
, [push_results, chain.last]
]
, res
, cb
)
}
function get_favorites(users, cb) {
function fn(user, cb_) {
get_one_users_favorites(user, cb_);
}
// this will run thru get_favorites in parallel
// and after all user favorites are gotten it will fire
// callback cb
asyncMap(users, fn, cb);
}
// code in the various functions in chain here,
// remember to either return the callback on completion.
// or pass it as an arg to the async call you make within the
// function as above i.e. asyncMap will fire cb on completion

UML might be appropriate. I'd look at the behavior diagrams.

Related

Node Js Complex Design Principle (Promise, async/await)

This is a common process for me in my previous works, so i usually have a very complex use case take for example
async function doThis(){
for (100x) {
try {
insertToDatabase()
await selectAndManipulateData()
createEmailWorker()
/** and many more **/
} catch {
logToAFile()
}
}
}
The code works, but its complicated 1 function doing all the things, the only reason i do this is because i can verify in real time if one function fails i can make sure the other function wont run so there wont be any incorrect data.
What i want to know is, what is the best architecture in defining a project structure that is not sacrificing the data integrity? (or is it already good enough?)

const doThis = async() => {
try {
for (100x) {
await insertToDatabase();
await selectAndManipulateData();
await createEmailWorker();
/** and many more **/
}
}
catch {
await logToAFile();
}
}
The best way of doing this is, you should always use await to call any function and make sure to with es6 syntax's as it gives a lot more feature. Your function should always be an async.
Always put your loop in try catch as it will give you any error in catch and it will calling function specific.

Actually, I would separate the persistence, manipulation and email jobs. Consider storing your data is a single responsibility. In addition to this, your modification and email workers should work as scheduled jobs. Once the jobs triggered, they should check if there is data related to its responsibility.
Another way is changing these scheduled jobs with triggered jobs. You can build a chain of responsibility that triggers next jobs and they would decide to work or not.

range.address throws context related errors

We've been developing using Excel JavaScript API for quite a few months now. We have been coming across context related issues which got resolved for unknown reasons. We weren't able to replicate these issues and wondered how they got resolved. Recently similar issues have started popping up again.
Error we consistently get:
property 'address' is not available. Before reading the property's
value, call the load method on the containing object and call
"context.sync()" on the associated request context.
We thought as we have multiple functions defined to modularise code in project, may be context differs somewhere among these functions which has gone unnoticed. So we came up with single context solution implemented via JavaScript Module pattern.
var ContextManager = (function () {
var xlContext;//single context for entire project/application.
function loadContext() {
xlContext = new Excel.RequestContext();
}
function sync(object) {
return (object === undefined) ? xlContext.sync() : xlContext.sync(object);
}
function getWorksheetByName(name) {
return xlContext.workbook.worksheets.getItem(name.toString());
}
//public
return {
loadContext: loadContext,
sync: sync,
getWorksheetByName: getWorksheetByName
};
})();
NOTE: above code shortened. There are other methods added to ensure that single context gets used throughout application.
While implementing single context, this time round, we have been able to replicate the issue though.
Office.initialize = function (reason) {
$(document).ready(function () {
ContextManager.loadContext();
function loadRangeAddress(rng, index) {
rng.load("address");
ContextManager.sync().then(function () {
console.log("Address: " + rng.address);
}).catch(function (e) {
console.log("Failed address for index: " + index);
});
}
for (var i = 1; i <= 1000; i++) {
var sheet = ContextManager.getWorksheetByName("Sheet1");
loadRangeAddress(sheet.getRange("A" + i), i);//I expect to see a1 to a1000 addresses in console. Order doesn't matter.
}
});
}
In above case, only "A1" gets printed as range address to console. I can't see any of the other addresses (A2 to A1000)being printed. Only catch block executes. Can anyone explain why this happens?
Although I've written for loop above, that isn't my use case. In real use case, such situations occur where one range object in function a needs to load range address. In the mean while another function b also wants to load range address. Both function a and function b work asynchronously on separate tasks such as one creates table object (table needs address) and other pastes data to sheet (there's debug statement to see where data was pasted).
This is something our team hasn't been able to figure out or find a solution for.

There is a lot packed into this code, but the issue you have is that you're calling sync a whole bunch of times without awaiting the previous sync.
There are several problems with this:
If you were using different contexts, you would actually see that there is a limit of ~50 simultaneous requests, after which you'll get errors.
In your case, you're running into a different (and almost opposite) problem. Given the async nature of the APIs, and the fact that you're not awaiting on the sync-s, your first sync request (which you'd think is for just A1) will actually contain all the load requests from the execution of the entire for loop. Now, once this first sync is dispatched, the action queue will be cleared. Which means that your second, third, etc. sync will see that there is no pending work, and will no-op, executing before the first sync ever came back with the values!
[This might be considered a bug, and I'll discuss with the team about fixing it. But it's still a very dangerous thing to not await the completion of a sync before moving on to the next batch of instructions that use the same context.]
The fix is to await the sync. This is far and away the simplest to do in TypeScript 2.1 and its async/await feature, otherwise you need to do the async version of the for loop, which you can look up, but it's rather unintuitive (requires creating an uber-promise that keeps chaining a bunch of .then-s)
So, your modified TypeScript-ified code would be
ContextManager.loadContext();
async function loadRangeAddress(rng, index) {
rng.load("address");
await ContextManager.sync().then(function () {
console.log("Address: " + rng.address);
}).catch(function (e) {
OfficeHelpers.Utilities.log(e);
});
}
for (var i = 1; i <= 1000; i++) {
var sheet = ContextManager.getWorksheetByName("Sheet1");
await loadRangeAddress(sheet.getRange("A" + i), i);//I expect to see a1 to a1000 addresses in console. Order doesn't matter.
}
Note the async in front of the loadRangeAddress function, and the two await-s in front of ContextManager.sync() and loadRangeAddress.
Note that this code will also run quite slowly, as you're making an async roundtrip for each cell. Which means you're not using batching, which is at the very core of the object-model for the new APIs.
For completeness sake, I should also note that creating a "raw" RequestContext instead of using Excel.run has some disadvantages. Excel.run does a number of useful things, the most important of which is automatic object tracking and un-tracking (not relevant here, since you're only reading back data; but would be relevant if you were loading and then wanting to write back into the object).
Finally, if I may recommend (full disclosure: I am the author of the book), you will probably find a good bit of useful info about Office.js in the e-book "Building Office Add-ins using Office.js", available at https://leanpub.com/buildingofficeaddins. In particular, it has a very detailed (10-page) section on the internal workings of the object model ("Section 5.5: Implementation details, for those who want to know how it really works"). It also offers advice on using TypeScript, has a general Promise/async-await primer, describes what .run does, and has a bunch more info about the OM. Also, though not available yet, it will soon offer information on how to resume using the same context (using a newer technique than what was originally described in How can a range be used across different Word.run contexts?). The book is a lean-published "evergreen" book, son once I write the topic in the coming weeks, an update will be available to all existing readers.
Hope this helps!

how to use marklogic query results in node.js

I have looked around quite extensivly and could not find any example on how to use query results from the marklogic module inside node.js...
Most examples do a console.log() of results and thats it, but what if I need the query results (say in a JSON array and use these results later on?
Seems I am missing some node.js ascynch stuff here...
Example :
var marklogic = require('marklogic');
var my = require('./my-connection.js');
var db = marklogic.createDatabaseClient(my.connInfo);
var qb = marklogic.queryBuilder;
db.documents.query(
qb.where(qb.parsedFrom('oslo'))
).result( function(results) {
console.log(JSON.stringify(results, null, 2));
});
// I would like to use the results here
// console.log(JSON.stringify(results, null, 2))
Now question is I would like to use the results object later on in this script. I have tried using .then(), or passing it to a variable and returning that variable but no luck.
Regards,
hugo

Simple answer: you need to continue your business logic from the result() callback.
In more detail, your goal is to do something with the result of an asynchronous computation or request. Since JS has no native async capabilities (such as threads), callbacks are typically used to resume an operation asynchronously. The most important thing to realize is that you cannot return the result of an async computation or request, but must resume control flow after it completes. Defining lots of functions can help make this kind of code easier to read and understand.
This example illustrates what's happening:
process.nextTick(function() {
console.log('second')
})
console.log('first')
That program will log first, then second, because process.nextTick() asynchronously invokes the callback function provided to it (on the next turn of the event loop).
The answers at How do I get started with Node.js offer lots of resources to better understand async programming with node.js.

Webworker-threads: is it OK to use "require" inside worker?

(Using Sails.js)
I am testing webworker-threads ( https://www.npmjs.com/package/webworker-threads ) for long running processes on Node and the following example looks good:
var Worker = require('webworker-threads').Worker;
var fibo = new Worker(function() {
function fibo (n) {
return n > 1 ? fibo(n - 1) + fibo(n - 2) : 1;
}
this.onmessage = function (event) {
try{
postMessage(fibo(event.data));
}catch (e){
console.log(e);
}
}
});
fibo.onmessage = function (event) {
//my return callback
};
fibo.postMessage(40);
But as soon as I add any code to query Mongodb, it throws an exception:
(not using the Sails model in the query, just to make sure the code could run on its own -- db has no password)
var Worker = require('webworker-threads').Worker;
var fibo = new Worker(function() {
function fibo (n) {
return n > 1 ? fibo(n - 1) + fibo(n - 2) : 1;
}
// MY DB TEST -- THIS WORKS FINE OUTSIDE THE WORKER
function callDb(event){
var db = require('monk')('localhost/mydb');
var users = db.get('users');
users.find({ "firstName" : "John"}, function (err, docs){
console.log(("serviceSuccess"));
return fibo(event.data);
});
}
this.onmessage = function (event) {
try{
postMessage(callDb(event.data)); // calling db function now
}catch (e){
console.log(e);
}
}
});
fibo.onmessage = function (event) {
//my return callback
};
fibo.postMessage(40);
Since the DB code works perfectly fine outside the Worker, I think it has something to do with the require. I've tried something that also works outside the Worker, like
var moment = require("moment");
var deadline = moment().add(30, "s");
And the code also throws an exception. Unfortunately, console.log only shows this for all types of errors:
{Object}
{/Object}
So, the questions are: is there any restriction or guideline for using require inside a Worker? What could I be doing wrong here?
UPDATE
it seems Threads will not allow external modules
https://github.com/xk/node-threads-a-gogo/issues/22
TL:DR I think that if you need to require, you should use a node's
cluster or child process. If you want to offload some cpu busy work,
you should use tagg and the load function to grab any helpers you
need.
Upon reading this thread, I see that this question is similar to this one:
Load Nodejs Module into A Web Worker
To which Audreyt, the webworker-threads author answered:
author of webworker-threads here. Thank you for using the module!
There is a default native_fs_ object with the readFileSync you can use
to read files.
Beyond that, I've mostly relied on onejs to compile all required
modules in package.json into a single JS file for importScripts to
use, just like one would do when deploying to a client-side web worker
environment. (There are also many alternatives to onejs -- browserify,
etc.)
Hope this helps!
So it seems importScripts is the way to go. But at this point, it might be too hacky for what I want to do, so probably KUE is a more mature solution.

I'm a collaborator on the node-webworker-threads project.
You can't require in node-webworker-threads
You are correct in your update: node-webworker-threads does not (currently) support requireing external modules.
It has limited support for some of the built-ins, including file system calls and a version of console.log. As you've found, the version of console.log implemented in node-webworker-threads is not identical to the built-in console.log in Node.js; it does not, for example, automatically make nice string representations of the components of an Object.
In some cases you can use external modules, as outlined by audreyt in her response. Clearly this is not ideal, and I view the incomplete require as the primary "dealbreaker" of node-webworker-threads. I'm hoping to work on it this summer.
When to use node-webworker-threads
node-webworker-threads allows you to code against the WebWorker API and run the same code in the client (browser) and the server (Node.js). This is why you would use node-webworker-threads over node-threads-a-gogo.
node-webworker-threads is great if you want the most lightweight possible JavaScript-based workers, to do something CPU-bound. Examples: prime numbers, Fibonacci, a Monte Carlo simulation, offloading built-in but potentially-expensive operations like regular expression matching.
When not to use node-webworker-threads
node-webworker-threads emphasizes portability over convenience. For a Node.js-only solution, this means that node-webworker-threads is not the way to go.
If you're willing to compromise on full-stack portability, there are two ways to go: speed and convenience.
For speed, try a C++ add-on. Use NaN. I recommend Scott Frees's C++ and Node.js Integration book to learn how to do this, it'll save you a lot of time. You'll pay for it in needing to brush up on your C++ skills, and if you want to work with MongoDB then this probably isn't a good idea.
For convenience, use a Child Process-based worker pool like fork-pool. In this case, each worker is a full-fledged Node.js instance. You can then require to your heart's content. You'll pay for it in a larger application footprint and in higher communication costs compared to node-webworker-threads or a C++ add-on.

To async, or not to async in node.js?

I'm still learning the node.js ropes and am just trying to get my head around what I should be deferring, and what I should just be executing.
I know there are other questions relating to this subject generally, but I'm afraid without a more relatable example I'm struggling to 'get it'.
My general understanding is that if the code being executed is non-trivial, then it's probably a good idea to async it, as to avoid it holding up someone else's session. There's clearly more to it than that, and callbacks get mentioned a lot, and I'm not 100% on why you wouldn't just synch everything. I've got some ways to go.
So here's some basic code I've put together in an express.js app:
app.get('/directory', function(req, res) {
process.nextTick(function() {
Item.
find().
sort( 'date-modified' ).
exec( function ( err, items ){
if ( err ) {
return next( err );
}
res.render('directory.ejs', {
items : items
});
});
});
});
Am I right to be using process.nextTick() here? My reasoning is that as it's a database call then some actual work is having to be done, and it's the kind of thing that could slow down active sessions. Or is that wrong?
Secondly, I have a feeling that if I'm deferring the database query then it should be in a callback, and I should have the actual page rendering happening synchronously, on condition of receiving the callback response. I'm only assuming this because it seems like a more common format from some of the examples I've seen - if it's a correct assumption can anyone explain why that's the case?
Thanks!

You are using it wrong in this case, because .exec() is already asynchronous (You can tell by the fact that is accepts a callback as a parameter).
To be fair, most of what needs to be asynchronous in nodejs already is.
As for page rendering, if you require the results from the database to render the page, and those arrive asynchronously, you can't really render the page synchronously.

Generally speaking it's best practice to make everything you can asynchronous rather than relying on synchronous functions ... in most cases that would be something like readFile vs. readFileSync. In your example, you're not doing anything synchronously with i/o. The only synchronous code you have is the logic of your program (which requires CPU and thus has to be synchronous in node) but these are tiny little things by comparison.
I'm not sure what Item is, but if I had to guess what .find().sort() does is build a query string internally to the system. It does not actually run the query (talk to the DB) until .exec is called. .exec takes a callback, so it will communicate with the DB asynchronously. When that communication is done, the callback is called.
Using process.nextTick does nothing in this case. That would just delay the calling of its code until the next event loop which there is no need to do. It has no effect on synchronicity or not.
I don't really understand your second question, but if the rendering of the page depends on the result of the query, you have to defer rendering of the page until the query completes -- you are doing this by rendering in the callback. The rendering itself res.render may not be entirely synchronous either. It depends on the internal mechanism of the library that defines the render function.
In your example, next is not defined. Instead your code should probably look like:
app.get('/directory', function(req, res) {
Item.
find().
sort( 'date-modified' ).
exec(function (err, items) {
if (err) {
console.error(err);
res.status(500).end("Database error");
}
else {
res.render('directory.ejs', {
items : items
});
}
});
});
});

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string