Background process, loading bar - node.js

Most server-side-scripting languages have an exec function (node, php, ruby, etc). This allows the programming language to interact with the shell.
I wish to use exec() in node.js to run large processes, things I used to do with AJAX requests in the browser. I would like a simple progress / loading bar to display to the user the progress.
I found out the hard way in the example below that the callback / asynchronous nature of the exec function will make this example take over 5 seconds to load.
What I want is some way to get the content of browser to be updated (ajax) with the current state of the execution like a loading bar. But I don't want the ran file to be dependent on the browser.
Any idea?
my route
exports.exec = function(req,res){
// http://nodejs.org/api.html#_child_processes
var sys = require('sys')
var exec = require('child_process').exec;
var child;
child = exec("node exec/test.js", function (error, stdout, stderr) {
var o = {
"error":error,
"stdout":stdout,
"stderr":stderr,
};
o = JSON.stringify(o);
res.send(o);
});
};
my test.js file
var sys = require('sys');
var count = 0;
var interval = setInterval(
function(){
sys.print('hello'+count);
count++
if(count == 5){
clearInterval(interval);
}
},
1000);

You should use socket.io for this.
Here is a demo app to get you started with socket.io
You emit an event for every 1% using socket.io, the browser listen to it and update a bar.
You can't use exec you need a streamed output.
Therefore you'd rather use child_process.
On the server.
var spawn = require('child_process').spawn,
exec = spawn('node exec/test.js');
exec.stdout.on('data', function (message) {
socket.emit('process', message);
});
On the sub process:
console.log('10%')
// ...
console.log('20%')
// ...
console.log('30%')
If your sub process is a node script you could do something a lot more elegant.
Rather than having to talk with a stream stdout you could use the cluster module of node to send messages between the master and the slaves process.
I made a fork of the previous demo app adding a route /exec which demonstrate how to achieve this.
When I'll have more time I'll make another demo app, this is a quite interesting and educational test. Thanks for the idea :D.

Related

How to fork a process in node that writes express response

I'd like to fork a long running express request in node and send an express response with the child, allowing the parent to serve other requests. I'm already using cluster but I'd like to fork another process in addition to the cluster for specific long running requests. What I'd like to prevent is all the processes in the cluster being consumed by a specific long running processes, while most of the other requests are fast.
Thanks
var express = require('express');
var webserver = express();
webserver.get("/test", function(request, response) {
// long running HTTP request
response.send(...);
});
What I'm thinking of is something like following, although I'm not sure this works:
var cp = require('child_process');
var express = require('express');
var webserver = express();
webserver.get("/test", function(request, response) {
var child = cp.fork('do_nothing.js');
child.on("message", function(message) {
if(message == "start") {
response.send(...);
process.exit();
}
});
child.send("start");
});
Let me know if anyone knows how to do this.
Edit: So, the idea is that the child could take a long time. There are a limited number of processes in the cluster serving express responses and I don't want to consume them all on a specific long-running request type. In the code below, the entire cluster would be consumed by the long running express requests.
while(1) {
if(rand() % 100 == 0) {
if(fork() == 0) {
sleep(hour(1));
exit(0);
}
} else {
sleep(second(1));
}
waitpid(WAIT_ANY, &status, WNOHANG);
}
Edit: I am going to mark the self-answer as solved. I'm sure there's a way to pass a socket to a child but it's not really necessary because the cluster master can manage all child processes. Thanks for your help.
Your second code block is confusing because it appears that you're killing the parent process with process.exit() rather than the child.
In any case, if we assume the problem is this:
You have a cluster of "regular processes".
Occasionally, you want to take an incoming request that was assigned to one of the cluster processes and pass it off to a long running child that will eventually send the response.
After sending the response, the long running child process should exit.
You have a couple options.
You can have the clustered process that was assigned the request, start up a child, send it some initial data and listen for a message back from the child. When it gets the message back from the child, it can send the response and kill the child. This appears to be what you're attempting to do in your second code block.
You can have the clustered process that was assigned the request, start up a child and reassign the request socket to the child process and the child can then own that socket from then on. When it finally sends the response, it can then exit itself.
The first is simpler because no socket assignment from one process to another is required. To implement the second, you'd have to write or find the code to do socket reassignment and then reconstituted as an express request within the child. The cluster module does something like this so the code is there to be found and learned from, but I'm not aware of a trivial way to do it.
Personally, I don't see any particular downside to the first. I suppose if the clustered process were to die for some , you'd lose the long running request socket, but hopefully you can just code your clustered processes not to die unnecessarily.
You can read this article on sending a socket to a new node.js process:
Sending a socket to a forked process
And, this node.js doc on sending a socket:
Example: sending a socket object
So, I've verified that this is not necessary for my use case, but I was able to get it working using the code below. It's not exactly what the OP asks for, but it works.
What it's doing is sending an instruction to the cluster master, which forks the additional process upon receipt of the slow express request.
Since the express request doesn't need to know the status of the newly forked cluster worker, it just handles the slow request as normal and then exits.
The instruction to the cluster master informs the master not to replace the dying slow express request process, so the number of workers reverts to the original number after the slow request finishes.
The pool will increase in size when there are slow requests, but revert to normal. This will prevent like 20 simultaneous slow requests from bringing down the cluster.
var numberOfWorkers = 10;
var workerCount = 0;
var slowRequestPids = { };
if (cluster.isMaster) {
for(var i = 0; i < numberOfWorkers; i++) {
workerCount++;
cluster.fork();
}
cluster.on('exit', function(worker) {
workerCount--;
var pidString = String(worker.process.pid);
if(pidString in slowRequestPids) {
delete slowRequestPids[pidString];
if(workerCount >= numberOfWorkers) {
logger.info('not forking replacement for slow process');
return;
}
}
logger.info('forking replacement for a process that died unexpectedly');
workerCount++;
cluster.fork();
}
cluster.on("message", function(msg) {
if(typeof msg.fork != "undefined" && workerCount < 100) {
logger.info("forking additional process upon slow request");
slowRequestPids[msg.fork] = 1;
workerCount++;
cluster.fork();
}
});
return;
}
webserver.use("/slow", function(req, res) {
process.send({fork: String(process.pid) });
sleep.sleep(300);
res.send({ response_from: "virtual child" });
res.on("finish", function() {
logger.info('process exits, restoring cluster to original size');
process.exit();
});
});

How do I exit if there is no new output within a defined timeframe

I'm running a NodeJS server, which becomes hung due to various factors. Every time this happens, I have to restart the NodeJS server to resolve the issue. Is there any workaround that exits the process automatically if there is no output within a defined timeframe? I suppose setTimeout() and the "process" module are needed, however, I'm unaware of how to achieve it elegantly.
Here's how you would exit a process after a timespan with setTimeout(). Its worth noting though you should just resolve your error in your process first though.
var input; // your input
var timeout = 60000; // timeout amount
setTimeout(timeout, function() {
if (!input)
process.exit(1);
});
The process module and setTimeout()are globals, they're automatically available to your module from the Node.js Core API.
Edit- ChildProcess
Since your "hanging" based on this TCP Module, take your implementing module and run it using childProcess.fork(). Then you can react to the lack of output from the module and kill the process accordingly.
Create a new file and put the following in it
var process = require('child_process');
// Spawn your Hanging Code
var tcp = process.fork('<pathToYourTcpCode>');
var timeout = 60000; // timeout amount
var output = '';
tcp.stdout.on('data', function(data) {
output += data;
});
setTimeout(timeout, function() {
if (!output)
tcp.kill('SIGINT');
});

Selenium Webdriver JS Scraping Parallely [nodejs]

I'm trying to create a pool of Phantom Webdrivers [using webdriverjs] like
var driver = new Webdriver.Builder().withCapabilities(Webdriver.Capabilities.phantomjs()).build();
Once the pool gets populated [I see n-number of phantom processes spawned], I try to do driver.get [using different drivers in the pool] of different urls expecting them to work parallely [as driver.get is async].
But I always see them being done sequentially.
Can't we load different urls parallely via different web driver instances?
If not possible in this way how else could I solve this issue?
Very Basic Impl of my question would look like below
var Webdriver = require('selenium-webdriver'),
function getInstance() {
return new Webdriver.Builder().withCapabilities(Webdriver.Capabilities.phantomjs()).build();
}
var pool = [];
for (var i = 0; i < 3; i++) {
pool.push(getInstance());
}
pool[0].get("http://mashable.com/2014/01/14/outdated-web-features/").then(function () {
console.log(0);
});
pool[1].get("http://google.com").then(function () {
console.log(1);
});
pool[2].get("http://techcrunch.com").then(function () {
console.log(2);
});
PS: Have already posted it here
Update:
I tried with selenium grid with the following setup; as it was mentioned that it can run tests parallely
Hub:
java -jar selenium/selenium-server-standale-2.39.0.jar -hosost 127.0.0.1 -port 4444 -role hub -nodeTimeout 600
Phantom:
phantomjs --webdriver=7777 --webdriver-selium-grid-hub=http://127.0.0.1:4444 --debug=true
phantomjs --webdriver=7877 --webdriver-selium-grid-hub=http://127.0.0.1:4444 --debug=true
phantomjs --webdriver=6777 --webdriver-selium-grid-hub=http://127.0.0.1:4444 --debug=true
Still I see the get command getting queued and executed sequentially instead being parall. [But gets properly distributed across 3 instances]
Am I still missing something out?
Why is it mentioned "scale by distributing tests on several machines ( parallel execution )" in the doc?
What is parallel as per the hub? I'm getting clueless
I guess I got the issue..
Basically https://code.google.com/p/selenium/source/browse/javascript/node/selenium-webdriver/executors.js#39 Is synchronous and blocking operation [atleast the get].
Whenever the get command is issued node's main thread get's stuck there. No further code execution.
A little late but for me it worked with webdriver.promise.createFlow.
You just have to wrap your code in webdriver.promise.createFlow() { ... }); and it works for me! Here's an example from Make parallel requests to a Selenium Webdriver grid. All thanks to the answerer there...
var flows = [0,1,2,3].map(function(index) {
return webdriver.promise.createFlow(function() {
var driver = new webdriver.Builder().forBrowser('firefox').usingServer('http://someurl:44111/wd/hub/').build();
console.log('Get');
driver.get('http://www.somepage.com').then(function() {
console.log('Screenshot');
driver.takeScreenshot().then(function(data){
console.log('foo/test' + index + '.png');
//var decodedImage = new Buffer(data, 'base64')
driver.quit();
});
});
});
});
I had the same issues, I finally got around the problem using child_process.
The way my app is setup is that I have many tasks that does different things, and that needs to run simultaneously (each of those use a different driver instance), obviously it was not working.
I now start those tasks in a child_process (which will run a new V8 process) and it does run everything in parallel.

How do I stop a Node.js HTTP server programmatically such that the process exits?

I'm writing some tests and would like to be able to start/stop my HTTP server programmatically. Once I stop the HTTP server, I would like the process that started it to exit.
My server is like:
// file: `lib/my_server.js`
var LISTEN_PORT = 3000
function MyServer() {
http.Server.call(this, this.handle)
}
util.inherits(MyServer, http.Server)
MyServer.prototype.handle = function(req, res) {
// code
}
MyServer.prototype.start = function() {
this.listen(LISTEN_PORT, function() {
console.log('Listening for HTTP requests on port %d.', LISTEN_PORT)
})
}
MyServer.prototype.stop = function() {
this.close(function() {
console.log('Stopped listening.')
})
}
The test code is like:
// file: `test.js`
var MyServer = require('./lib/my_server')
var my_server = new MyServer();
my_server.on('listening', function() {
my_server.stop()
})
my_server.start()
Now, when I run node test.js, I get the stdout output that I expect,
$ node test.js
Listening for HTTP requests on port 3000.
Stopped listening.
but I have no idea how to get the process spawned by node test.js to exit and return back to the shell.
Now, I understand (abstractly) that Node keeps running as long as there are bound event handlers for events that it's listening for. In order for node test.js to exit to the shell upon my_server.stop(), do I need to unbind some event? If so, which event and from what object? I have tried modifying MyServer.prototype.stop() by removing all event listeners from it but have had no luck.
I've been looking for an answer to this question for months and I've never yet seen a good answer that doesn't use process.exit. It's quite strange to me that it is such a straightforward request but no one seems to have a good answer for it or seems to understand the use case for stopping a server without exiting the process.
I believe I might have stumbled across a solution. My disclaimer is that I discovered this by chance; it doesn't reflect a deep understanding of what's actually going on. So this solution may be incomplete or maybe not the only way of doing it, but at least it works reliably for me. In order to stop the server, you need to do two things:
Call .end() on the client side of every opened connection
Call .close() on the server
Here's an example, as part of a "tape" test suite:
test('mytest', function (t) {
t.plan(1);
var server = net.createServer(function(c) {
console.log("Got connection");
// Do some server stuff
}).listen(function() {
// Once the server is listening, connect a client to it
var port = server.address().port;
var sock = net.connect(port);
// Do some client stuff for a while, then finish the test
setTimeout(function() {
t.pass();
sock.end();
server.close();
}, 2000);
});
});
After the two seconds, the process will exit and the test will end successfully. I've also tested this with multiple client sockets open; as long as you end all client-side connections and then call .close() on the server, you are good.
http.Server#close
https://nodejs.org/api/http.html#http_server_close_callback
module.exports = {
server: http.createServer(app) // Express App maybe ?
.on('error', (e) => {
console.log('Oops! Something happened', e));
this.stopServer(); // Optionally stop the server gracefully
process.exit(1); // Or violently
}),
// Start the server
startServer: function() {
Configs.reload();
this.server
.listen(Configs.PORT)
.once('listening', () => console.log('Server is listening on', Configs.PORT));
},
// Stop the server
stopServer: function() {
this.server
.close() // Won't accept new connection
.once('close', () => console.log('Server stopped'));
}
}
Notes:
"close" callback only triggers when all leftover connections have finished processing
Trigger process.exit in "close" callback if you want to stop the process too
To cause the node.js process to exit, use process.exit(status) as described in http://nodejs.org/api/process.html#process_process_exit_code
Update
I must have misunderstood.
You wrote: "...but I have no idea how to get the process spawned by node test.js to exit and return back to the shell."
process.exit() does this.
Unless you're using the child_processes module, node.js runs in a single process. It does not "spawn" any further processes.
The fact that node.js continues to run even though there appears to be nothing for it to do is a feature of its "event loop" which continually loops, waiting for events to occur.
To halt the event loop, use process.exit().
UPDATE
After a few small modifications, such as the proper use of module.exports, addition of semicolons, etc., running your example on a Linux server (Fedora 11 - Leonidas) runs as expected and dutifully returns to the command shell.
lib/my_server.js
// file: `lib/my_server.js`
var util=require('util'),
http=require('http');
var LISTEN_PORT=3000;
function MyServer(){
http.Server.call(this, this.handle);
}
util.inherits(MyServer, http.Server);
MyServer.prototype.handle=function(req, res){
// code
};
MyServer.prototype.start=function(){
this.listen(LISTEN_PORT, function(){
console.log('Listening for HTTP requests on port %d.', LISTEN_PORT)
});
};
MyServer.prototype.stop=function(){
this.close(function(){
console.log('Stopped listening.');
});
};
module.exports=MyServer;
test.js
// file: `test.js`
var MyServer = require('./lib/my_server');
var my_server = new MyServer();
my_server.on('listening', function() {
my_server.stop();
});
my_server.start();
Output
> node test.js
Listening for HTTP requests on port 3000.
Stopped listening.
>
Final thoughts:
I've found that the conscientious use of statement-ending semicolons has saved me from a wide variety of pernicious, difficult to locate bugs.
While most (if not all) JavaScript interpreters provide something called "automatic semicolon insertion" (or ASI) based upon a well-defined set of rules (See http://dailyjs.com/2012/04/19/semicolons/ for an excellent description), there are several instances where this feature can inadvertently work against the intent of the programmer.
Unless you are very well versed in the minutia of JavaScript syntax, I would strongly recommend the use of explicit semicolons rather than relying upon ASI's implicit ones.

How can I execute a node.js module as a child process of a node.js program?

Here's my problem. I implemented a small script that does some heavy calculation, as a node.js module. So, if I type "node myModule.js", it calculates for a second, then returns a value.
Now, I want to use that module from my main Node.JS program. I could just put all the calculation in a "doSomeCalculation" function then do:
var myModule = require("./myModule");
myModule.doSomeCalculation();
But that would be blocking, thus it'd be bad. I'd like to use it in a non-blocking way, like DB calls natively are, for instance. So I tried to use child_process.spawn and exec, like this:
var spawn = require("child_process").spawn;
var ext = spawn("node ./myModule.js", function(err, stdout, stderr) { /* whatevs */ });
ext.on("exit", function() { console.log("calculation over!"); });
But, of course, it doesn't work. I tried to use an EventEmitter in myModule, emitting "calculationDone" events and trying to add the associated listener on the "ext" variable in the example above. Still doesn't work.
As for forks, they're not really what I'm trying to do. Forks would require putting the calculation-related code in the main program, forking, calculating in the child while the parent does whatever it does, and then how would I return the result?
So here's my question: can I use a child process to do some non-blocking calculation, when the calculation is put in a Node file, or is it just impossible? Should I do the heavy calculation in a Python script instead? In both cases, how can I pass arguments to the child process - for instance, an image?
I think what you're after is the child_process.fork() API.
For example, if you have the following two files:
In main.js:
var cp = require('child_process');
var child = cp.fork('./worker');
child.on('message', function(m) {
// Receive results from child process
console.log('received: ' + m);
});
// Send child process some work
child.send('Please up-case this string');
In worker.js:
process.on('message', function(m) {
// Do work (in this case just up-case the string
m = m.toUpperCase();
// Pass results back to parent process
process.send(m.toUpperCase(m));
});
Then to run main (and spawn a child worker process for the worker.js code ...)
$ node --version
v0.8.3
$ node main.js
received: PLEASE UP-CASE THIS STRING
It doesn't matter what you will use as a child (Node, Python, whatever), Node doesn't care. Just make sure, that your calculcation script exits after everything is done and result is written to stdout.
Reason why it's not working is that you're using spawn instead of exec.

Resources