Webworker-threads: is it OK to use "require" inside worker? - node.js

(Using Sails.js)
I am testing webworker-threads ( https://www.npmjs.com/package/webworker-threads ) for long running processes on Node and the following example looks good:
var Worker = require('webworker-threads').Worker;
var fibo = new Worker(function() {
function fibo (n) {
return n > 1 ? fibo(n - 1) + fibo(n - 2) : 1;
}
this.onmessage = function (event) {
try{
postMessage(fibo(event.data));
}catch (e){
console.log(e);
}
}
});
fibo.onmessage = function (event) {
//my return callback
};
fibo.postMessage(40);
But as soon as I add any code to query Mongodb, it throws an exception:
(not using the Sails model in the query, just to make sure the code could run on its own -- db has no password)
var Worker = require('webworker-threads').Worker;
var fibo = new Worker(function() {
function fibo (n) {
return n > 1 ? fibo(n - 1) + fibo(n - 2) : 1;
}
// MY DB TEST -- THIS WORKS FINE OUTSIDE THE WORKER
function callDb(event){
var db = require('monk')('localhost/mydb');
var users = db.get('users');
users.find({ "firstName" : "John"}, function (err, docs){
console.log(("serviceSuccess"));
return fibo(event.data);
});
}
this.onmessage = function (event) {
try{
postMessage(callDb(event.data)); // calling db function now
}catch (e){
console.log(e);
}
}
});
fibo.onmessage = function (event) {
//my return callback
};
fibo.postMessage(40);
Since the DB code works perfectly fine outside the Worker, I think it has something to do with the require. I've tried something that also works outside the Worker, like
var moment = require("moment");
var deadline = moment().add(30, "s");
And the code also throws an exception. Unfortunately, console.log only shows this for all types of errors:
{Object}
{/Object}
So, the questions are: is there any restriction or guideline for using require inside a Worker? What could I be doing wrong here?
UPDATE
it seems Threads will not allow external modules
https://github.com/xk/node-threads-a-gogo/issues/22
TL:DR I think that if you need to require, you should use a node's
cluster or child process. If you want to offload some cpu busy work,
you should use tagg and the load function to grab any helpers you
need.
Upon reading this thread, I see that this question is similar to this one:
Load Nodejs Module into A Web Worker
To which Audreyt, the webworker-threads author answered:
author of webworker-threads here. Thank you for using the module!
There is a default native_fs_ object with the readFileSync you can use
to read files.
Beyond that, I've mostly relied on onejs to compile all required
modules in package.json into a single JS file for importScripts to
use, just like one would do when deploying to a client-side web worker
environment. (There are also many alternatives to onejs -- browserify,
etc.)
Hope this helps!
So it seems importScripts is the way to go. But at this point, it might be too hacky for what I want to do, so probably KUE is a more mature solution.

I'm a collaborator on the node-webworker-threads project.
You can't require in node-webworker-threads
You are correct in your update: node-webworker-threads does not (currently) support requireing external modules.
It has limited support for some of the built-ins, including file system calls and a version of console.log. As you've found, the version of console.log implemented in node-webworker-threads is not identical to the built-in console.log in Node.js; it does not, for example, automatically make nice string representations of the components of an Object.
In some cases you can use external modules, as outlined by audreyt in her response. Clearly this is not ideal, and I view the incomplete require as the primary "dealbreaker" of node-webworker-threads. I'm hoping to work on it this summer.
When to use node-webworker-threads
node-webworker-threads allows you to code against the WebWorker API and run the same code in the client (browser) and the server (Node.js). This is why you would use node-webworker-threads over node-threads-a-gogo.
node-webworker-threads is great if you want the most lightweight possible JavaScript-based workers, to do something CPU-bound. Examples: prime numbers, Fibonacci, a Monte Carlo simulation, offloading built-in but potentially-expensive operations like regular expression matching.
When not to use node-webworker-threads
node-webworker-threads emphasizes portability over convenience. For a Node.js-only solution, this means that node-webworker-threads is not the way to go.
If you're willing to compromise on full-stack portability, there are two ways to go: speed and convenience.
For speed, try a C++ add-on. Use NaN. I recommend Scott Frees's C++ and Node.js Integration book to learn how to do this, it'll save you a lot of time. You'll pay for it in needing to brush up on your C++ skills, and if you want to work with MongoDB then this probably isn't a good idea.
For convenience, use a Child Process-based worker pool like fork-pool. In this case, each worker is a full-fledged Node.js instance. You can then require to your heart's content. You'll pay for it in a larger application footprint and in higher communication costs compared to node-webworker-threads or a C++ add-on.

Related

Azure function run code on startup for Node

I am developing Chatbot using Azure functions. I want to load the some of the conversations for Chatbot from a file. I am looking for a way to load these conversation data before the function app starts with some function callback. Is there a way load the conversation data only once when the function app is started?
This question is actually a duplicate of Azure Function run code on startup. But this question is asked for C# and I wanted a way to do the same thing in NodeJS
After like a week of messing around I got a working solution.
First some context:
The question at hand, running custom code # App Start for Node JS Azure Functions.
The issue is currently being discussed here and has been open for almost 5 years, and doesn't seem to be going anywhere.
As of now there is an Azure Functions "warmup" trigger feature, found here AZ Funcs Warm Up Trigger. However this trigger only runs on-scale. So the first, initial instance of your App won't run the "warmup" code.
Solution:
I created a start.js file and put the following code in there
const ErrorHandler = require('./Classes/ErrorHandler');
const Validator = require('./Classes/Validator');
const delay = require('delay');
let flag = false;
module.exports = async () =>
{
console.log('Initializing Globals')
global.ErrorHandler = ErrorHandler;
global.Validator = Validator;
//this is just to test if it will work with async funcs
const wait = await delay(5000)
//add additional logic...
//await db.connect(); etc // initialize a db connection
console.log('Done Waiting')
}
To run this code I just have to do
require('../start')();
in any of my functions. Just one function is fine. Since all of the function dependencies are loaded when you deploy your code, as long as this line is in one of the functions, start.js will run and initialize all of your global/singleton variables or whatever else you want it to do on func start. I made a literal function called "startWarmUp" and it is just a timer triggered function that runs once a day.
My use case is that almost every function relies on ErrorHandler and Validator class. And though generally making something a global variable is bad practice, in this case I didn't see any harm in making these 2 classes global so they're available in all of the functions.
Side Note: when developing locally you will have to include that function in your func start --functions <function requiring start.js> <other funcs> in order to have that startup code actually run.
Additionally there is a feature request for this functionality that can voted on open here: Azure Feedback
I have a similar use case that I am also stuck on.
Based on this resource I have found a good way to approach the structure of my code. It is simple enough: you just need to run your initialization code before you declare your module.exports.
https://github.com/rcarmo/azure-functions-bot/blob/master/bot/index.js
I also read this thread, but it does not look like there is a recommended solution.
https://github.com/Azure/azure-functions-host/issues/586
However, in my case I have an additional complication in that I need to use promises as I am waiting on external services to come back. These promises run within bot.initialise(). Initialise() only seems to run when the first call to the bot occurs. Which would be fine, but as it is running a promise, my code doesn't block - which means that when it calls 'listener(req, context.res)' it doesn't yet exist.
The next thing I will try is to restructure my code so that bot.initialise returns a promise, but the code would be much simpler if there was a initialisation webhook that guaranteed that the code within it was executed at startup before everything else.
Has anyone found a good workaround?
My code looks something like this:
var listener = null;
if (process.env.FUNCTIONS_EXTENSION_VERSION) {
// If we are inside Azure Functions, export the standard handler.
listener = bot.initialise(true);
module.exports = function (context, req) {
context.log("Passing body", req.body);
listener(req, context.res);
}
} else {
// Local server for testing
listener = bot.initialise(false);
}
You can use global variable to load data before function execution.
var data = [1, 2, 3];
module.exports = function (context, req) {
context.log(data[0]);
context.done();
};
data variable initialized only once and will be used within function calls.

Sinon - when to use spies/mocks/stubs or just plain assertions?

I'm trying to understand how Sinon is used properly in a node project. I have gone through examples, and the docs, but I'm still not getting it. I have setup a directory with the following structure to try and work through the various Sinon features and understand where they fit in
|--lib
|--index.js
|--test
|--test.js
index.js is
var myFuncs = {};
myFuncs.func1 = function () {
myFuncs.func2();
return 200;
};
myFuncs.func2 = function(data) {
};
module.exports = myFuncs;
test.js begins with the following
var assert = require('assert');
var sinon = require('sinon');
var myFuncs = require('../lib/index.js');
var spyFunc1 = sinon.spy(myFuncs.func1);
var spyFunc2 = sinon.spy(myFuncs.func2);
Admittedly this is very contrived, but as it stands I would want to test that any call to func1 causes func2 to be called, so I'd use
describe('Function 2', function(){
it('should be called by Function 1', function(){
myFuncs.func1();
assert(spyFunc2.calledOnce);
});
});
I would also want to test that func1 will return 200 so I could use
describe('Function 1', function(){
it('should return 200', function(){
assert.equal(myFuncs.func1(), 200);
});
});
but I have also seen examples where stubs are used in this sort of instance, such as
describe('Function 1', function(){
it('should return 200', function(){
var test = sinon.stub().returns(200);
assert.equal(myFuncs.func1(test), 200);
});
});
How are these different? What does the stub give that a simple assertion test doesn't?
What I am having the most trouble getting my head around is how these simple testing approaches would evolve once my program gets more complex. Say I start using mysql and add a new function
myFuncs.func3 = function(data, callback) {
connection.query('SELECT name FROM users WHERE name IN (?)', [data], function(err, rows) {
if (err) throw err;
names = _.pluck(rows, 'name');
return callback(null, names);
});
};
I know when it comes to databases some advise having a test db for this purpose, but my end-goal might be a db with many tables, and it could be messy to duplicate this for testing. I have seen references to mocking a db with sinon, and tried following this answer but I can't figure out what's the best approach.
You have asked so many different questions on one post... I will try to sort out.
Testing myFuncs with two functions.
Sinon is a mocking library with wide features. "Mocking" means you are supposed to replace some part of what is going to be tested with mocks or stubs. There is a good article among Sinon documentation which describes the difference well.
When you created a spy in this case...
var spyFunc1 = sinon.spy(myFuncs.func1);
var spyFunc2 = sinon.spy(myFuncs.func2);
...you've just created a watcher. myFuncs.func1 and myFuncs.func2 will be substituted with a spy-function, but it will be used to record the calling arguments and call real function after that. This is a possible scenario, but mind that all possibly complicated logic of myFuncs.func1/func2 will run after being called in the test (ex.: database query).
2.1. The describe('Function 1', ...) test suite looks too contrived to me.
It is not obvious which question you actually mean.
A function that returns a constant value is not a real life example.. In most cases there will be some parameters and the function under test will implement some algorithm of transmuting the input arguments. So in your test you're going to implement the same algorithm partly to check that the function works correctly. That is where TDD comes in place, which actually supposes you start implementation from a test and take parts of unit test code to implement the method being tested.
2.2. Stub. The second version of unit test looks useless in the given example. func1 is not accepting any parameter.
var test = sinon.stub().returns(200);
assert.equal(myFuncs.func1(test), 200);
Even if you replace the return part with 100 the test will run successfully.
The one that makes sense is, for example, replacing func2 with a stub to avoid heavy calculation/remote request (DB query, http or other API request) being launched in a test.
myFuncs.func2 = sinon.spy();
assert.equal(myFuncs.func1(test), 200);
assert(myFuncs.func2.calledOnce);
The basic rule for unit testing is that unit test should be kept as simple as possible, providing check for as smallest possible fragment of code. In this test func1 is being tested so we can neglect the logic of func2. Which should be tested in another unit test.
Mind that doing the following attempt is useless:
myFuncs.func1 = sinon.stub().returns(200);
assert.equal(myFuncs.func1(test), 200);
Because in this case you masked the real func1 logic with a stub and you're actually testing sinon.stub().return(). Believe me it works well! :D
Mocking database queries.
Mocking a database has always been a hurdle. I could provide some advises.
3.1. Have well fragmented environment. Even for a small project there better exist a development, stage and production completely independent environments. Including databases. Which means you have an automated way of DB creation: scripts or ORM.
In this scenario you will be easily maintaining test DB in your test engine using before()/beforeEach() to have a clean structure for your tests.
3.2. Have well fragmented code. There'd better exist several layers. The lowest (DAL) should be separated from business logic. In this case you will write code for business class, simply mocking DAL. To test DAL you can have the approach you mentioned (sinon.mock the whole module) or some specific libraries (ex.: replace db engine with SQLite for tests as described here)
Conclusion. "how these simple testing approaches would evolve once my program gets more complex".
It is hard to maintain unit tests unless you develop your application with tests in mind and, thus, well fragmented. Stick to the main rule - keep each unit test as small as possible. Otherwise you're right, it will eventually get messy. Because the constantly evolving logic of your application will be involved in your test code.

Best way to manage unique child processes in node.js

I'm about to start coding a chat bot. However, I plan on running more than one, using a wrapper to communicate and restart them. I have done this in the past with child_process.fork(), but it was incredibly inefficient. I've looked into spawn and cluster as well, but they all seem to focus on running the same thing, not unique bots. As for plugins, I've looked into fleet, forkfriend, and workerfarm, but none seem to fit my needs.
Is there any plugin or way I'm not seeing to help me do this? Or am I just going to have o wing it again?
You can have as many chat bots as you wish in a single process. The rule of thumb in Node.js is using one process per processor core since Node has slightly different multithreading model you might got used to.
Assuming you still need some multithreading on top of this, here is a couple of node modules you might find fitting your needs:
node-webworker-threads, dnode.
UPDATE:
Now I see what you need. There is a nice example in Node.js docs, which I saw recently. I just copy & paste it here:
var normal = require('child_process').fork('child.js', ['normal']);
var special = require('child_process').fork('child.js', ['special']);
// Open up the server and send sockets to child
var server = require('net').createServer();
server.on('connection', function (socket) {
// if this is a VIP
if (socket.remoteAddress === '74.125.127.100') {
special.send('socket', socket);
return;
}
// just the usual dudes
normal.send('socket', socket);
});
server.listen(1337);
child.js looks like this:
process.on('message', function(m, socket) {
if (m === 'socket') {
socket.end('You were handled as a ' + process.argv[2] + ' person');
}
});
I believe it's pretty much what you need. Launch several processes with different configs (if number of configs is relatively low) and pass socket to a particular one from master process.

synchronous sqlite transactions node

I installed sqlite3 with npm
npm install sqlite3 --save
I have written some basic functions that I would like to be performed synchronously rather then with async.
for instance I would like to get column names in one function and the row count in another.
I would like to simply return these values. currently I am using a callback like so
d.cinfo = function(table, callback){
var o = {};
db.each("PRAGMA table_info(" + table + ")", function(err, col){
o[col.name] = col.type;
}, function(){
if(typeof callback == 'function') callback(o);
});
}
d is an object which is later exposed
however Id like to return the values
d.cinfo = function(table, callback){
var o = {};
db.each("PRAGMA table_info(" + table + ")", function(err, col){
o[col.name] = col.type;
}, function(){
return o;
});
}
is there a way I can achieve this. I found documentation saying it was possible but then I learned that it was outdated and im not sure its meant for the same api
I have implemented bluebird however Promise.promisifyAll(middleware) returns and error "Object # has no method .then() anyone know what im doing wrong
I've come across a problem like this and many others while using Express and it really drives me crazy where I just can't avoid callback hells.
Some of the possible solutions can be:
Using the same old event driven style. Instead of calling nested functions, emit events for them. But I'm assuming you wouldn't like that very much, but that atleast gets rid of the nested callbacks.
There's a npm module called bluebird that allows you to extend existing modules with promises. (Using promisifyAll). So promises will still trim down the code and you'll have much cleaner code than you had before using promises.
If you don't mind switching frameworks, you can try KoaJS that allows you to yield(fun stuff from ECS 6) functions using generator functions(another cool feature). You can have more reading about this here.
So the above code can be re-written like:
var rows = yield db.each() //or something like that
this is an amazing feature but the drawback is that you have to use unstable versions of node (0.11.x). We also faced some issues setting up https using 0.11.x and had to downgrade to 0.10.x (we're not using koa either)
If there are only a few places where you need to do such kind of stuff, I would recommend using bluebird, but if you're tired of having lots of callbacks in your code, better go for Koa, although I've already mentioned you the tradeoffs.

Node.js // A module to manage a sqlite database

I managed to create a module to handle all the database call. It uses this lib: https://github.com/developmentseed/node-sqlite3
My issues are the following.
Everytime I make a call, I need to make sure the database exist, and if not to create it.
Plus, as all the calls are asynchronous, I end up having loads of functions in functions in callbacks ... etc.
It pretty much looks like this:
getUsers : function (callback){
var _aUsers = [];
var that = this;
this._setupDb(function(){
var db = that.db;
db.all("SELECT * FROM t_client", function(err, rows) {
rows.forEach(function (row) {
_aUsers.push({"cli_id":row.id,"cli_name":row.cli_name,"cli_path":row.cli_path});
});
callback(_aUsers);
});
});
},
So, is there any way I can export my module only when the database is ready and fully created if it does not exist yet?
Does anyone see a way around the "asynchronous" issue?
You could also try using promises or fibers ...
I don't think so. If you make it synchronous, you are taking away the advantage. Javascript functions are meant to be that way. Such a situation is referred to as callback hell. If you are facing problems managing callbacks then you can use these libraries :
underscore
async
See these guides to understand basics of asynchronous programming
Node.js: Style and structure
Using underscore.js managing-callback-spaghetti-in-nodejs
Using async.js node-js-async-programming

Resources