I'm new to node.js, so please forgive what probably is a naive question :) My question is what is the best way to setup a non-UI job written in node? The task I've created is used to crawl some web content based upon an Azure queue (the queue message tells the job which content to crawl). All of the examples I see around node are more UI and request based, using http.createServer and listening on a specific port. While I can make this work, this doesn't seem right, it seems like I just need to create some sort of javascript setInterval loop (or something similar) that keeps looking at my queue. Any suggestions or examples that would push me in the right direction would be greatly appreciated.
Chris
I'm not really clear on what you're trying to do, but node doesn't depend on the http stack at all. If you just want to start node and have it process something, that is pretty straightforward. Your app.js could be as simple as:
var queueWorker = require('worker');
var startWorker = function() {
if(queueWorker.hasWork()) {
queueWorker.processQueue(startWorker);
} else {
setTimeout(startWorker, 1000);
}
};
startWorker();
What this is doing is setting up a worker loop where every second it will check to see if there is new work, and if there is start processing it. Once it is done processing the work, go back to the 1 second interval checking for new work.
You would have to create the worker module as the check for hasWork and the processing of said work is application dependent.
If you wanted to get a little more fancy, processQueue could spawn a new node process which is only responsible for actually processing the work, then you could keep track of the number of spawned workers versus CPU limitations and have a relatively simple node app which processes data on multiple threads.
Related
I'm learning how to emit events using the NodeJS Event module but I'm struggling on a workflow interrogation.
I'd like to listen for new post created on a specific website using web scraping as it doesn't provide any other way.
For now, all I could find was by using a loop every X minutes and emit it once it notices a difference. But it involves a lot of inconvenience :
How to do it in a loop without blocking the whole script
That involves that you need to wait before continuing in the loop
So, how should you listen to events such as a new post? Is looking for differences at regular intervals a good practice?
The scraping is not blocking the whole script : the loading itself is pretty asynchronous, only the parsing part is synchronous.
If the scraping action is done with a promise, you can run promisses in parallel with promises.all, you could also run scraping with setInterval that way it will be started at regular interval even if some of the scrapings are slow or even failing
If you really want to not block the eventloop at all, you can run a worker for each scraping process.
I have gone through many painful months with is issue and I am now ready to let this go to the bin of "what-a-great-lie-for-websites-nodejs-is"!!!
All NodeJS tutorials discuss how to create a website. When done, it works. For one person at a time though. All requests sent to the port will be blocked by the first come-first-serve situation. Why? Because most requests sent to the nodejs server will have to get parsed, data requested from the database, data calculated and parsed, response prepared and sent back to the ajax call. (this is a mere simple website example).
Same applies for authentication - a request is made, data is parsed, authentication takes place, session is created and sent back to the requester.
No matter how you sugar coat this - All requests are done this way. Yes you can employ async functionality which will shorten the time spent on some portions, yes you can try promises, yes you can employ clustering, yes you can employ forking/spawning, etc... The result is always the same at all times: port gets blocked.
I tried responding with a reference so that we can use sockets to pass the data back and matching it with the reference - that also blocked it.
The point of the matter is this: when you ask for help, everyone wants all sort of code examples, but never go to the task of helping with an actual answer that works. The whole wide world!!!!! Which leads me to the answer that nodeJS is not suitable for websites.
I read many requests regarding this and all have been met with: "you must code properly"..! Really? Is there no NodeJS skilled and experienced person who can lend an answer on this one????!!!!!
Most of the NodeJS coders come from the PHP side - All websites using PHP never have to utilise any workaround whatsoever in order to display a web page and it never blocks 2 people at the same time. Thanks to the web server.
So how come NodeJS community cannot come to some sort of an asnwer on this one other than: "code carefully"???!!!
They want examples - each example is met with: "well that is a blocking code, do it another way", or "code carefully"!!! Come one people!!!
Think of the following scenario: One User, One page with 4 lists of records. Theoretically all lists should be filled up with records independently. What is happening because of how data is requested, prepared and responded, each list in reality is waiting for the next one to finish. That is on one session alone.
What about 2 people, 2 sessions at the same time?
So my question is this: is NodeJS suitable for a website and if it is, can anyone show and prove this with a short code??? If you can't prove it, then the answer is: "NodeJS is not suitable for websites!"
Here is an example based on the simplest tutorial and it is still blocking:
var express = require('express'),
fs = require("fs");
var app = express();
app.get('/heavyload', function (req, res) {
var file = '/media/sudoworx/www/sudo-sails/test.zip';
res.send('Heavy Load');
res.end();
fs.readFile(file, 'utf8', function (err,fileContent) {
});
});
app.get('/lightload', function (req, res) {
var file = '/media/sudoworx/www/sudo-sails/test.zip';
res.send('Light Load');
res.end();
});
app.listen(1337, function () {
console.log('Listening!')
});
Now, if you go to "/heavyload" it will immediately respond because that is the first thing sent to the browser, and then nodejs proceeds reading a heavy file (a large file). If you now go to the second call "/lightload" at the same time, you will see that it is waiting for the loading of the file to finish from the first call before it proceeds with the browser output. This is the simplest example of how nodejs simply fails in handling what otherwise would be simple in php and similar script languages.
Like mentioned before, I tried as many as 20 different ways to do this in my career of nodejs programmer. I totally love nodejs, but I cannot get past this obstacle... This is not a complaint - it is a call for help because I am at my end road with nodejs and I don't know what to do.
I thank you kindly.
So here is what I found out. I will answer it with an example of a blocking code:
for (var k = 0; k < 15000; k++){
console.log('Something Output');
}
res.status(200).json({test:'Heavy Load'});
This will block because it has to do the for loop for a long time and then after it finished it will send the output.
Now if you do the same code like this it won't block:
function doStuff(){
for (var k = 0; k < 15000; k++){
console.log('Something Output');
}
}
doStuff();
res.status(200).json({test:'Heavy Load'});
Why? Because the functions are run asynchronously...! So how will I then send the resulting response to the requesting client? Currently I am doing it as follows:
Run the doStuff function
Send a unique call reference which is then received by the ajax call on the client side.
Put the callback function of the client side into a waiting object.
Listen on a socket
When the doStuff function is completed, it should issue a socket message with the resulting response together with the unique reference
When the socket on the client side gets the message with the unique reference and the resulting response, it will then match it with the waiting callback function and run it.
Done! A bit of a workaround (as mentioned before), but it's working! It does require a socket to be listening. And that is my solution to this port-blocking situation in NodeJS.
Is there some other way? I am hoping someone answers with another way, because I am still feeling like this is some workaround. Eish! ;)
is NodeJS suitable for a website
Definitely yes.
can anyone show and prove this with a short code
No.
All websites using PHP never have to utilise any workaround whatsoever
in order to display a web page and it never blocks 2 people at the
same time.
Node.js doesn't require any workarounds as well, it simply works without blocking.
To more broadly respond to your question/complaint:
A single node.js machine can easily serve as a web-server for a website and handle multiple sessions (millions actually) without any need for workarounds.
If you're coming from a PHP web-server, maybe instead of trying to migrate an existing code to a new Node website, first play with online simple website example of Node.js + express, if that works well, you can start adding code that require long-running processes like reading from DBs or reading/writing to files and verify that visitors aren't being blocked (they shouldn't be blocked).
See this tutorial on how to get started.
UPDATE FOR EXAMPLE CODE
To fix the supplied example code, you should convert your fs.readFile call to fs.createReadStream. readFile is less recommended for large files, I don't think readFile literally blocked anything, but the need to allocate and move large amounts of bytes may choke the server, createReadStream uses chunks instead which is much easier on the CPU and RAM:
rstream = fs.createReadStream(file);
var length = 0;
rstream.on('data', function (chunk) {
length += chunk.length;
// Do something with the chunk
});
rstream.on('end', function () { // done
console.log('file read! length = ' + length);
});
After switching your code to createReadStream I'm able to serve continues calls to heavyload / lightload in ~3ms each
THE BETTER ANSWER I SHOULD HAVE WRITTEN
Node.js is a single process architecture, but it has multi-process capabilities, using the cluster module, it allows you to write master/workers code that distributes the load across multiple workers on multiple processes.
You can also use pm2 to do the clustering for you, it has a built-in load balancer to distribute to work, and also allows for scaling up/down without downtime, see this.
In your case, while one process is reading/writing a large file, other processes can accept incoming requests and handle them.
I have a node program that does a lot of heavy synchronous work. The work that needs to be done could easily be split into several parts. I would like to utilize all processor cores on my machine for this. Is this possible?
Form the docs on child processes and clusters I see no obvious solution. Child processes seems to be focused on running external programs and clusters only work for incoming http connections (or have I misunderstood that?).
I have a simple function var output = fn(input) and would just like to run it several times, spread all the calls across the cores on my machine and provide the result in a callback. Can that be done?
Yes, child processes and clusters are the way to do that. There are a couple of ways of implementing a solution to your problem.
Your server creates a queue and manages that queue. Whenever you need to call your function, you will drop it into the queue. You will then process the queue N items at a time, where N equals the number of your cores. When you start processing, you will spawn a child process, probably either using spawn or exec, with the argument being another standalone Node.js script, along with any additional parameters (it's just a command line call, basically). Inside that script you will do your work, and emit the result back to the server. The worker is then freed up.
You can create a dedicated server with cluster, where all it will do is run your function. With the cluster module, you can (once again) create N number of other workers, and delegate work to these wokers.
Now this may seem like a lot of work, and it is. And for that reason you should use an existing library as this is a, for the most part, a solve problem at this point. I really like redis-based queues, so if you're interested in that see this answer for some queue recommendations.
I'm pretty new to Node.js, though I've been writing javascript for years. I'm more than open to any node advice for best-practices that I'm not following, or other rethinks. That said:
I'm building a system in which a user creates a reservation, and simultaneously submits a task for my firebase-queue to pick up. This queue has multiple specs associated with it. In turn, it's supposed to:
check availability, and in response confirm/throw an alert on the reservation and update the firebase data accordingly.
Update the users reservations, which is an index of reservation object keys, and removing any redundant ones.
Use node-schedule to create dated functions to send notifications about the pending expiration of their reservation.
However, when I run my script, only one of the firebase-queues that I instantiate runs. I can look in the dashboard and see that the progress is at 100, the _state is the new finished_state (which is the next spec's start_state), but that next queue won't pick up the task and process it.
If I quit my script and rerun it, that next queue will work fine. And then the queue after that won't work, until I repeat the act of quitting and rerunning the script. I can continue this until the entire task sequence completes, so I don't think the specs or the code being executed itself are blocking. I don't see any error states spring up, anyway.
From the documentation it looks like I should be able to write the script this way, with multiple calls to 'new Queue(queueRef, options, function(data, progress, resolve, reject)...' and they'll just run each task as I set them in their options (all of which are basically:
var options = {
'specId': 'process_reservation',
'numWorkers': 5,
'sanitize': true,
'suppressStack': false
};
but, nope. I guess I can spawn child-processes for each of the queue instances, but I'm not sure if that's an extreme reaction to the issues I'm seeing, and I'm not sure if it would complicate the node structure in terms of shared module exports. Also, I'm not sure if it'll start eating into my concurrent connection count.
Thanks!
I'm working on an application that processes (possibly large reaching one or two million lines) text (in tab separated form) files containing detail of items and since the processing time can be long I want to update a progress bar so the user knows that the application didn't just hang, or better, to provide an idea of the remaining time.
I've already researched and I know how to update a simple progress bar but the examples tend to be simplistic as to call something like progressBar.setProgress(counter++, 100) using Timer, there are other examples where the logic is simple and written in the same class. I'm also new to the language having done mostly Java and some JavaScript in the past, among others.
I wrote the logic for processing the file (validation of input and creation of output files). But then, if I call the processing logic in the main class the update will be done at the end of processing (flying by so fast from 0 to 100) no matter if I update variables and try to dispatch events or things like that; the bar won't reflect the processing progress.
Would processing the input by chunks be a valid approach? And then, I'm not sure if the processing delay of one data chunk won't affect the processing of the next chunk and so on, because the timer tick is set to be 1 millisecond and the chunk processing time would be longer than that. Also, if the order of the input won't be affected or the result will get corrupted in some way. I've read multithreading is not supported in the language, so should that be a concern?
I already coded the logic described before and it seems to work:
// called by mouse click event
function processInput():void {
timer = new Timer(1);
timer.addEventListener(TimerEvent.TIMER, processChunk);
timer.start();
}
function processChunk(event:TimerEvent):void {
// code to calculate start and end index for the data chunk,
// everytime processChunk is executed these indexes are updated
var dataChunk:Array = wholeInputArray.splice(index0, index1);
processorObj.processChunk(dataChunk)
progressBar.setProgress(index0, wholeInputArray.length);
progressBar.label = index0 + " processed items";
if(no more data to process) { // if wholeInputArray.length == index1
timer.stop();
progressBar.setProgress(wholeInputArray.length, wholeInputArray.length);
progressBar.label = "Processing done";
// do post processing here: show results, etc.
}
}
The declaration for the progress bar is as follows:
<mx:ProgressBar id="progressBar" x="23" y="357" width="411" direction="right"
labelPlacement="center" mode="manual" indeterminate="false" />
I tested it with an input of 50000 lines and it seems to work generating the same result as the other approach that processes the input at once. But, would that be a valid approach or is there a better approach?
Thanks in advance.
your solution is good, i use it most of time.
But multithreading is now supported on AS3 (for desktop and web only for the moment).
Have a look at: Worker documentation and Worker exemple.
Hope that helps :)
may I ask if this Timer AS IS is the working Timer ??? because IF YES then you are in for a lot of trouble with your Application in the long run! - re loading & getting the Timer to stop, close etc. The EventListener would be incomplete and would give problems for sure!
I would like to recommend to get this right first before going further as I know from experience as in some of my own AIR Applications I need to have several hundred of them running one after another in modules as well as in some of my web Apps. not quiet so intense yet a few!
I'm sure a more smother execution will be the reward! regards aktell
Use Workers. Because splitting data into chunks and then processing it is a valid but quite cumbersome approach and with workers you can simply spawn a background worker, do all the parsing there and return a result, all without blocking GUI. Worker approach should require less time to do parsing, because there is no need to stop parser and wait for the next frame.
Workers would be an ideal solution, but quite complicated to set up. If you're not up to it right now, here's a PseudoThread solution I use in similar situations which you can probably get up and running in 5 minutes:
Pseudo Threads
It uses EnterFrame events for balancing between work and letting the UI does its thing and you can manually update the progress bar within your 'thread' code. I think it would be easily adapted for your needs since your data is easily sliced.
Without using Workers (which it seems you are not yet familiar with) AS3 will behave single threaded. Your timers will not overlap. If one of your chunks takes more than 1s to complete the next timer event will be processed when it can. It will not queue up further events if it takes more than your time period ( assuming your processing code is blocking).
The previous answers show the "correct" solution to this, but this might get you where you need to be faster.