Is NodeJS is suitable for websites - (ie. Port Blocking)? - node.js

I have gone through many painful months with is issue and I am now ready to let this go to the bin of "what-a-great-lie-for-websites-nodejs-is"!!!
All NodeJS tutorials discuss how to create a website. When done, it works. For one person at a time though. All requests sent to the port will be blocked by the first come-first-serve situation. Why? Because most requests sent to the nodejs server will have to get parsed, data requested from the database, data calculated and parsed, response prepared and sent back to the ajax call. (this is a mere simple website example).
Same applies for authentication - a request is made, data is parsed, authentication takes place, session is created and sent back to the requester.
No matter how you sugar coat this - All requests are done this way. Yes you can employ async functionality which will shorten the time spent on some portions, yes you can try promises, yes you can employ clustering, yes you can employ forking/spawning, etc... The result is always the same at all times: port gets blocked.
I tried responding with a reference so that we can use sockets to pass the data back and matching it with the reference - that also blocked it.
The point of the matter is this: when you ask for help, everyone wants all sort of code examples, but never go to the task of helping with an actual answer that works. The whole wide world!!!!! Which leads me to the answer that nodeJS is not suitable for websites.
I read many requests regarding this and all have been met with: "you must code properly"..! Really? Is there no NodeJS skilled and experienced person who can lend an answer on this one????!!!!!
Most of the NodeJS coders come from the PHP side - All websites using PHP never have to utilise any workaround whatsoever in order to display a web page and it never blocks 2 people at the same time. Thanks to the web server.
So how come NodeJS community cannot come to some sort of an asnwer on this one other than: "code carefully"???!!!
They want examples - each example is met with: "well that is a blocking code, do it another way", or "code carefully"!!! Come one people!!!
Think of the following scenario: One User, One page with 4 lists of records. Theoretically all lists should be filled up with records independently. What is happening because of how data is requested, prepared and responded, each list in reality is waiting for the next one to finish. That is on one session alone.
What about 2 people, 2 sessions at the same time?
So my question is this: is NodeJS suitable for a website and if it is, can anyone show and prove this with a short code??? If you can't prove it, then the answer is: "NodeJS is not suitable for websites!"
Here is an example based on the simplest tutorial and it is still blocking:
var express = require('express'),
fs = require("fs");
var app = express();
app.get('/heavyload', function (req, res) {
var file = '/media/sudoworx/www/sudo-sails/test.zip';
res.send('Heavy Load');
res.end();
fs.readFile(file, 'utf8', function (err,fileContent) {
});
});
app.get('/lightload', function (req, res) {
var file = '/media/sudoworx/www/sudo-sails/test.zip';
res.send('Light Load');
res.end();
});
app.listen(1337, function () {
console.log('Listening!')
});
Now, if you go to "/heavyload" it will immediately respond because that is the first thing sent to the browser, and then nodejs proceeds reading a heavy file (a large file). If you now go to the second call "/lightload" at the same time, you will see that it is waiting for the loading of the file to finish from the first call before it proceeds with the browser output. This is the simplest example of how nodejs simply fails in handling what otherwise would be simple in php and similar script languages.
Like mentioned before, I tried as many as 20 different ways to do this in my career of nodejs programmer. I totally love nodejs, but I cannot get past this obstacle... This is not a complaint - it is a call for help because I am at my end road with nodejs and I don't know what to do.
I thank you kindly.

So here is what I found out. I will answer it with an example of a blocking code:
for (var k = 0; k < 15000; k++){
console.log('Something Output');
}
res.status(200).json({test:'Heavy Load'});
This will block because it has to do the for loop for a long time and then after it finished it will send the output.
Now if you do the same code like this it won't block:
function doStuff(){
for (var k = 0; k < 15000; k++){
console.log('Something Output');
}
}
doStuff();
res.status(200).json({test:'Heavy Load'});
Why? Because the functions are run asynchronously...! So how will I then send the resulting response to the requesting client? Currently I am doing it as follows:
Run the doStuff function
Send a unique call reference which is then received by the ajax call on the client side.
Put the callback function of the client side into a waiting object.
Listen on a socket
When the doStuff function is completed, it should issue a socket message with the resulting response together with the unique reference
When the socket on the client side gets the message with the unique reference and the resulting response, it will then match it with the waiting callback function and run it.
Done! A bit of a workaround (as mentioned before), but it's working! It does require a socket to be listening. And that is my solution to this port-blocking situation in NodeJS.
Is there some other way? I am hoping someone answers with another way, because I am still feeling like this is some workaround. Eish! ;)

is NodeJS suitable for a website
Definitely yes.
can anyone show and prove this with a short code
No.
All websites using PHP never have to utilise any workaround whatsoever
in order to display a web page and it never blocks 2 people at the
same time.
Node.js doesn't require any workarounds as well, it simply works without blocking.
To more broadly respond to your question/complaint:
A single node.js machine can easily serve as a web-server for a website and handle multiple sessions (millions actually) without any need for workarounds.
If you're coming from a PHP web-server, maybe instead of trying to migrate an existing code to a new Node website, first play with online simple website example of Node.js + express, if that works well, you can start adding code that require long-running processes like reading from DBs or reading/writing to files and verify that visitors aren't being blocked (they shouldn't be blocked).
See this tutorial on how to get started.
UPDATE FOR EXAMPLE CODE
To fix the supplied example code, you should convert your fs.readFile call to fs.createReadStream. readFile is less recommended for large files, I don't think readFile literally blocked anything, but the need to allocate and move large amounts of bytes may choke the server, createReadStream uses chunks instead which is much easier on the CPU and RAM:
rstream = fs.createReadStream(file);
var length = 0;
rstream.on('data', function (chunk) {
length += chunk.length;
// Do something with the chunk
});
rstream.on('end', function () { // done
console.log('file read! length = ' + length);
});
After switching your code to createReadStream I'm able to serve continues calls to heavyload / lightload in ~3ms each
THE BETTER ANSWER I SHOULD HAVE WRITTEN
Node.js is a single process architecture, but it has multi-process capabilities, using the cluster module, it allows you to write master/workers code that distributes the load across multiple workers on multiple processes.
You can also use pm2 to do the clustering for you, it has a built-in load balancer to distribute to work, and also allows for scaling up/down without downtime, see this.
In your case, while one process is reading/writing a large file, other processes can accept incoming requests and handle them.

Related

Synchronous Blocks in Node.js

I want to know if it's possible to have synchronous blocks in a Node.js application. I'm a total newbie to Node, but I couldn't find any good answers for the behavior I'm looking for specifically.
My actual application will be a little more involved, but essentially what I want to do is support a GET request and a POST request. The POST request will append an item to an array stored on the server and then sort the array. The GET request will return the array. Obviously, with multiple GETs and POSTs happening simultaneously, we need a correctness guarantee. Here's a really simple example of what the code would theoretically look like:
var arr = [];
app.get(/*URL 1*/, function (req, res) {
res.json(arr);
});
app.post(/*URL 2*/, function (req, res) {
var itemToAdd = req.body.item;
arr.push(itemToAdd);
arr.sort();
res.status(200);
});
What I'm worried about is a GET request returning the array before it is sorted but after the item is appended or, even worse, returning the array as it's being sorted. In a language like Java I would probably just use a ReadWriteLock. From what I've read about Node's asynchrony, it doesn't appear that arr will be accessed in a way that preserves this behavior, but I'd love to be proven wrong.
If it's not possible to modify the code I currently have to support this behavior, are there any other alternatives or workarounds to get the application to do what I want it to do?
What I'm worried about is a GET request returning the array before it is sorted but after the item is appended or, even worse, returning the array as it's being sorted.
In the case of your code here, you don't have to worry about that (although read on because you may want to worry about other things!). Node.js is single-threaded so each function will run in its entirety before returning control to the Event Loop. Because all your array manipulation is synchronous, your app will wait until the array manipulation is done before answering a GET request.
One thing to watch out for then, of course, is if (for example) the .sort() takes a long time. If it does, your server will block while that is going on. And this is why people tend to favor asynchronous operations instead. But if your array is guaranteed to be small and/or this is an app with a limited number of users (say, it's an intranet app for a small company), then what you're doing may work just fine.
To get a good understanding of the whole single-threaded + event loop thing, I recommend Philip Roberts's talk on the event loop.

Getting data out of a MongoDB call [duplicate]

This question already has answers here:
Why is my variable unaltered after I modify it inside of a function? - Asynchronous code reference
(7 answers)
Closed 5 years ago.
I am unable to retrieve data from my calls to MongoDB.
The calls work as I can display results to the console but when I try to write/copy those results to an external array so they are usable to my program, outside the call, I get nothing.
EVERY example that I have seen does all it's work within the connection loop. I cannot find any examples where results are copied to an array (either global or passed in), the connection ends, and the program continues processing the external array.
Most of the sample code out there is either too simplistic (ie. just console.log within the connection loop) or is way too complex, with examples of how to make express api routes. I don't need this as I am doing old fashioned serial batch processing.
I understand that Mongo is built to be asynchronous but I should still be able to work with it.
MongoClient.connect('mongodb://localhost:27017/Lessons', function (err, db) {
assert.equal(err, null);
console.log("Connectied to the 'Lessons' database");
var collection = db.collection('students');
collection.find().limit(10).toArray(function(err, docs) {
// console.log(docs);
array = docs.slice(0); //Cloning array
console.log(array);
db.close();
});
});
console.log('database is closed');
console.log(array);
It looks like I'm trying to log the data before the loop has finished. But how to synchronize the timing?
If somebody could explain this to me I'd be really grateful as I've been staring at this stuff for days and am really feeling stupid.
From the code you have shared, do you want the array to display in the console.log at the end? This will not work with your current setup as the 2 console.log's at the end will run before the query to your database is complete.
You should grab your results with a call back function. If your not sure what those are, you will need to learn as mongo / node use them everywhere. Basically javascript is designed to run really fast, and won't wait for something to finish before going to the next line of code.
This tutorial helped me a lot when I was first learning: https://zellwk.com/blog/crud-express-mongodb/
Could you let us know what environment you are running this mongo request? It would give more context because right now I'm not sure how you are using mongo.
thanks for the quick response.
Environment is Windows7 with an instance of mongod running in the background so I'm connecting to localhost. I'm using a db that I created but you can use any collection to run the sample code.
I did say I thought it was a timing thing. Your observation "the 2 console.log's at the end will run before the query to your database is complete" really clarified the problem for me.
I replaced the code at the end, after the connect() with the following code:
function waitasecond(){
console.log(array);
}
setTimeout(waitasecond, 2000);
And the array is fully populated. This suggests that what I was trying to do, at least the way I wanted to do, is not possible. I think I have two choices.
Sequential processing (as I originally concieved) - I would have to put some kind of time delay to let the db call finish before commencing.
Create a function with all the code for the processing that needs to be done and call it from inside the database callback when the database returns the data.
The first options is a bit smelly. I wouldn't want to see that in production so I guess I'll take the second option.
Thanks for the recommeded link. I did a quick look and the problem, for me, is that this is describing a very common pattern that relies on express listening for router calls to respond. The processing I'm doing has nothing to do with router calls.
Ah for the good old days of syncronous io.

How do Node.js Streams work?

I have a question about Node.js streams - specifically how they work conceptually.
There is no lack of documentation on how to use streams. But I've had difficulty finding how streams work at the data level.
My limited understanding of web communication, HTTP, is that full "packages" of data are sent back and forth. Similar to an individual ordering a company's catalogue, a client sends a GET (catalogue) request to the server, and the server responds with the catalogue. The browser doesn't receive a page of the catalogue, but the whole book.
Are node streams perhaps multipart messages?
I like the REST model - especially that it is stateless. Every single interaction between the browser and server is completely self contained and sufficient. Are node streams therefore not RESTful? One developer mentioned the similarity with socket pipes, which keep the connection open. Back to my catalogue ordering example, would this be like an infomercial with the line "But wait! There's more!" instead of the fully contained catalogue?
A large part of streams is the ability for the receiver 'down-stream' to send messages like 'pause' & 'continue' upstream. What do these messages consist of? Are they POST?
Finally, my limited visual understanding of how Node works includes this event loop. Functions can be placed on separate threads from the thread pool, and the event loop carries on. But shouldn't sending a stream of data keep the event loop occupied (i.e. stopped) until the stream is complete? How is it ALSO keeping watch for the 'pause' request from downstream?n Does the event loop place the stream on another thread from the pool and when it encounters a 'pause' request, retrieve the relevant thread and pause it?
I've read the node.js docs, completed the nodeschool tutorials, built a heroku app, purchased TWO books (real, self contained, books, kinda like the catalogues spoken before and likely not like node streams), asked several "node" instructors at code bootcamps - all speak about how to use streams but none speak about what's actually happening below.
Perhaps you have come across a good resource explaining how these work? Perhaps a good anthropomorphic analogy for a non CS mind?
The first thing to note is: node.js streams are not limited to HTTP requests. HTTP requests / Network resources are just one example of a stream in node.js.
Streams are useful for everything that can be processed in small chunks. They allow you to process potentially huge resources in smaller chunks that fit into your RAM more easily.
Say you have a file (several gigabytes in size) and want to convert all lowercase into uppercase characters and write the result to another file. The naive approach would read the whole file using fs.readFile (error handling omitted for brevity):
fs.readFile('my_huge_file', function (err, data) {
var convertedData = data.toString().toUpperCase();
fs.writeFile('my_converted_file', convertedData);
});
Unfortunately this approch will easily overwhelm your RAM as the whole file has to be stored before processing it. You would also waste precious time waiting for the file to be read. Wouldn't it make sense to process the file in smaller chunks? You could start processing as soon as you get the first bytes while waiting for the hard disk to provide the remaining data:
var readStream = fs.createReadStream('my_huge_file');
var writeStream = fs.createWriteStream('my_converted_file');
readStream.on('data', function (chunk) {
var convertedChunk = chunk.toString().toUpperCase();
writeStream.write(convertedChunk);
});
readStream.on('end', function () {
writeStream.end();
});
This approach is much better:
You will only deal with small parts of data that will easily fit into your RAM.
You start processing once the first byte arrived and don't waste time doing nothing, but waiting.
Once you open the stream node.js will open the file and start reading from it. Once the operating system passes some bytes to the thread that's reading the file it will be passed along to your application.
Coming back to the HTTP streams:
The first issue is valid here as well. It is possible that an attacker sends you large amounts of data to overwhelm your RAM and take down (DoS) your service.
However the second issue is even more important in this case:
The network may be very slow (think smartphones) and it may take a long time until everything is sent by the client. By using a stream you can start processing the request and cut response times.
On pausing the HTTP stream: This is not done at the HTTP level, but way lower. If you pause the stream node.js will simply stop reading from the underlying TCP socket.
What is happening then is up to the kernel. It may still buffer the incoming data, so it's ready for you once you finished your current work. It may also inform the sender at the TCP level that it should pause sending data. Applications don't need to deal with that. That is none of their business. In fact the sender application probably does not even realize that you are no longer actively reading!
So it's basically about being provided data as soon as it is available, but without overwhelming your resources. The underlying hard work is done either by the operating system (e.g. net, fs, http) or by the author of the stream you are using (e.g. zlib which is a Transform stream and usually bolted onto fs or net).
The below chart seems to be a pretty accurate 10.000 feet overview / diagram for the the node streams class.
It represents streams3, contributed by Chris Dickinson.
So first of all, what are streams?
Well, with streams we can process meaning read and write data piece by piece without completing the whole read or write operation. Therefore we don't have to keep all the data in memory to do these operations.
For example, when we read a file using streams, we read part of the data, do something with it, then free our memory, and repeat this until the entire file has been processed. Or think of YouTube or Netflix, which are both called streaming companies because they stream video using the same principle.
So instead of waiting until the entire video file loads, the processing is done piece by piece or in chunks so that you can start watching even before the entire file has been downloaded. So the principle here is not just about Node.JS. But universal to computer science in general.
So as you can see, this makes streams the perfect candidate for handing large volumes of data like for example, video or also data that we're receiving piece by piece from an external source. Also, streaming makes the data processing more efficient in terms of memory because there is no need to keep all the data in memory and also in terms of time because we can start processing the data as it arrives, rather than waiting until everything arrives.
How they are implemented in Node.JS:
So in Node, there are four fundamental types of streams:
readable streams, writable streams, duplex streams, and transform streams. But the readable and writeable ones are the most important ones, readable streams are the ones from which we can read and we can consume data. Streams are everywhere in the core Node modules, for example, the data that comes in when an http server gets a request is actually a readable stream. So all the data that is sent with the request comes in piece by piece and not in one large piece. Also, another example from the file system is that we can read a file piece by piece by using a read screen from the FS module, which can actually be quite useful for large text files.
Well, another important thing to note is that streams are actually instances of the EventEmitter class. Meaning that all streams can emit and listen to named events. In the case of readable streams, they can emit, and we can listen to many different events. But the most important two are the data and the end events. The data event is emitted when there is a new piece of data to consume, and the end event is emitted as soon as there is no more data to consume. And of course, we can then react to these events accordingly.
Finally, besides events, we also have important functions that we can use on streams. And in the case of readable streams, the most important ones are the pipe and the read functions. The super important pipe function, which basically allows us to plug streams together, passing data from one stream to another without having to worry much about events at all.
Next up, writeable streams are the ones to which we can write data. So basically, the opposite of readable streams. A great example is the http response that we can send back to the client and which is actually a writeable stream. So a stream that we can write data into. So when we want to send data, we have to write it somewhere, right? And that somewhere is a writeable stream, and that makes perfect sense, right?
For example, if we wanted to send a big video file to a client, we would just like Netflix or YouTube do. Now about events, the most important ones are the drain and the finish events. And the most important functions are the write and end functions.
About duplex streams. They're simply streams that are both readable and writeable at the same time. These are a bit less common. But anyway, a good example would be a web socket from the net module. And a web socket is basically just a communication channel between client and server that works in both directions and stays open once the connection has been established.
Finally, transform streams are duplex streams, so streams that are both readable and writeable, which at the same time can modify or transform the data as it is read or written. A good example of this one is the zlib core module to compress data which actually uses a transform stream.
*** Node implemented these http requests and responses as streams, and we can then consume, we can use them using the events and functions that are available for each type of stream. We could of course also implement our own streams and then consume them using these same events and functions.
Now let's try some example:
const fs = require('fs');
const server = require('http').createServer();
server.on('request', (req, res) =>{
fs.readFile('./txt/long_file.txt', (err, data)=>{
if(err) console.log(err);
res.end(data);
});
});
server.listen('8000','127.0.01', ()=>{
console.log(this);
});
Suppose long_file.txt file contain 1000000K lines and each line contain more thean 100 words, so this is a hug file with a big chunk of data, now in the above example problem is by using readFile() function node will load entire file into memory, because only after loading the whole file into memory node can transfar the data as a responce object.
When the file is big, and also when there are a ton of request hitting your server, by means of time node process will very quickly run out of resources and your app will quit working, everything will crash.
Let's try to find a solution by using stream:
const fs = require('fs');
const server = require('http').createServer();
server.on('request', (req, res) =>{
const readable = fs.createReadStream('./txt/long_file.txt');
readable.on('data', chunk=>{
res.write(chunk);
});
readable.on('end',()=>{
res.end();
})
readable.on('error', err=>{
console.log('err');
res.statusCode=500;
res.end('File not found');
});
});
server.listen('8000','127.0.01', ()=>{
console.log(this);
});
Well in the above example with the stream, we are effectively streaming the file, we are reading one piece of the file, and as soon as that's available, we send it right to the client, using the write method of the respond stream. Then when the next pice is available then that piece will be sent, and all the way until the entire file is read and streamed to the client.
So the stream is basically finished reading the data from the file, the end event will be emitted to signals that no more data will be written to this writable stream.
With the above practice, we solved previous problem, but still, there is a huge problem remain with the above example which is called backpressure.
The problem is that our readable stream, the one that we are using to read files from the disk, is much much faster than actually sending the result with the response writable stream over the network. And this will overwhelm the response stream, which cannot handle all this incoming data so fast and this problem is called backpressure.
The solution is using the pipe operator, it will handle the speed of data coming in and speed of data going out.
const fs = require('fs');
const server = require('http').createServer();
server.on('request', (req, res) =>{
const readable = fs.createReadStream('./txt/long_file.txt');
readable.pipe(res);
});
server.listen('8000','127.0.01', ()=>{
console.log(this);
});
I think you are overthinking how all this works and I like it.
What streams are good for
Streams are good for two things:
when an operation is slow and it can give you partials results as it gets them. For example read a file, it is slow because HDDs are slow and it can give you parts of the file as it reads it. With streams you can use these parts of the file and start to process them right away.
they are also good to connect programs together (read functions). Just as in the command line you can pipe different programs together to produce the desired output. Example: cat file | grep word.
How they work under the hood...
Most of these operations that take time to process and can give you partial results as it gets them are not done by Node.js they are done by the V8 JS Engine and it only hands those results to JS for you to work with them.
To understand your http example you need to understand how http works
There are different encodings a web page can be send as. In the beginning there was only one way. Where a whole page was sent when it was requested. Now it has more efficient encodings to do this. One of them is chunked where parts of the web page are sent until the whole page is sent. This is good because a web page can be processed as it is received. Imagine a web browser. It can start to render websites before the download is complete.
Your .pause and .continue questions
First, Node.js streams only work within the same Node.js program. Node.js streams can't interact with a stream in another server or even program.
That means that in the example below, Node.js can't talk to the webserver. It can't tell it to pause or resume.
Node.js <-> Network <-> Webserver
What really happens is that Node.js asks for a webpage and it starts to download it and there is no way to stop that download. Just dropping the socket.
So, what really happens when you make in Node.js .pause or .continue?
It starts to buffer the request until you are ready to start to consume it again. But the download never stopped.
Event Loop
I have a whole answer prepared to explain how the Event Loop works but I think it is better for you to watch this talk.

Node.js server GET to separate API failing after a few hours of use

In my node site I call a restful API service I have built using a standard http get. After a few hours of this communication successfully working I find that the request stops being sent, it just waits and eventually times out.
The API that is being called is still receiving requests from elsewhere perfectly well but when a request is sent from the site it does not reach the API.
I have tried with stream.pipe, util.pump and just writing the file to the file system.
I am using Node 0.6.15. My site and the service that is being called are on the same server so calls to localhost are being made. Memory usage is about 25% over all with cpu averaging about 10% usage.
After a while of the problem I started using the request module but I get the same behaviour. The number of calls it makes before failing varrys it seems between 5 to 100. In the end I have to restart the site but not the api to make it work again.
Here is roughly what the code in the site looks like:
var Request = require('request');
downloadPDF: function(req, res) {
Project.findById(req.params.Project_id, function(err, project) {
project.findDoc(req.params.doc_id ,function(err, doc) {
var pdfileName;
pdfileName = doc.name + ".pdf";
res.contentType(pdfileName);
res.header('Content-Disposition', "filename=" + pdfileName);
Request("http://localhost:3001/" + project._id).pipe(res);
});
});
}
I am lots at what could be happening.
Did you try to increase agent.maxSockets, or to disable http.Agent functionality? By default, recent node versions use sockets pooling for HTTP client connects, this may be source of the problem
http://nodejs.org/api/http.html#http_class_http_agent
I'm not sure how busy your Node server is, but it could be that all of your sockets are in TIME_WAIT status.
If you run this command, you should see how many sockets are in this state:
netstat -an | awk '/tcp/ {print $6}' | sort | uniq -c
It's normal to have some, of course. You just don't want to max out your system's available sockets and have them all be in TIME_WAIT.
If this is the case, you would actually want to reduce the agent.maxSockets setting (contrary to #user1372624's suggestion), as otherwise each request would simply receive a new socket even though it could simply reuse a recent one. It will simply take longer to reach a non-responsive state.
I found this Gist (a patch to http.Agent) that might help you.
This Server Fault answer might also help: https://serverfault.com/a/212127
Finally, it's also possible that updating Node may help, as they may have addressed the keep-alive behavior since your version (you might check the change log).
You're using a callback to return a value which doesn't make a lot of sense because your Project.findById() returns immediately, without waiting for the provided callback to complete.
Don't feel bad though, the programming model that nodejs uses is somewhat difficult at first to wrap your head around.
In event-driven programming (EDP), we provide callbacks to accomplish results, ignoring their return values since we never know when the callback might actually be called.
Here's a quick example.
Suppose we want to write the result of an HTTP request into a file.
In a procedural (non-EDP) programming environment, we rely on functions that only return values when they have them to return.
So we might write something like (pseudo-code):
url = 'http://www.example.com'
filepath = './example.txt'
content = getContentFromURL(url)
writeToFile(filepath,content)
print "Done!"
which assumes that our program will wait until getContentFromURL() has contacted the remote server, made its request, waited for a result and returned that result to the program.
The writeToFile() function then asks the operating system to open a local file at some filepath for write, waiting until it is told the open file operation has completed (typically waiting for the disk driver to report that it can carry out such an operation.)
writeToFile() then requests that the operating system write the content to the newly opened file, waiting until it is told that the driver the operating system uses to write files tells it it has accomplished this goal, returning the result to the program so it can tell us that the program has completed.
The problem that nodejs was created to solve is to better use all the time wasted by all the waiting that occurs above.
It does this by using functions (callbacks) which are called when operations like retrieving a result from a remote web request or writing a file to the filesystem complete.
To accomplish the same task above in an event-driven programming environment, we need to write the same program as:
getContentFromURL(url,onGetContentFromURLComplete)
function onGetContentFromURLComplete(content,err){
writeToFile(content,onWriteToFileComplete);
}
function onWriteToFileComplete(err){
print "Done!";
}
where
calling getContentFromURL() only calls the onGetContentFromURLComplete callback once it has the result of the web request and
calling writeToFile() only calls its callback to display a success message when it completes writing the content successfully.
The real magic of nodejs is that it can do all sorts of other things during the surprisingly large amount of time procedural functions have to wait for most time-intensive operations (like those concerned with input and output) to complete.
(The examples above ignore all errors which is usually considered a bad thing.)
I have also experienced intermittent errors using the inbuilt functions. As a work around I use the native wget. I do something like the following
var exec = require('child_process').exec;
function fetchURL(url, callback) {
var child;
var command = 'wget -q -O - ' + url;
child = exec(command, function (error, stdout, stderr) {
callback(error, stdout, stderr);
});
}
With a few small adaptations you could make it work for your needs. So far it is rock solid for me.
Did you try logging your parameters while calling this function? The error can depend on req.params.Project_id. You should also provide error handling in your callback functions.
If you can nail down the failing requests to a certain parameter set (make them reproducible), you could debug you application easily with node-inspector.

Node.js background processing

I'm new to node.js, so please forgive what probably is a naive question :) My question is what is the best way to setup a non-UI job written in node? The task I've created is used to crawl some web content based upon an Azure queue (the queue message tells the job which content to crawl). All of the examples I see around node are more UI and request based, using http.createServer and listening on a specific port. While I can make this work, this doesn't seem right, it seems like I just need to create some sort of javascript setInterval loop (or something similar) that keeps looking at my queue. Any suggestions or examples that would push me in the right direction would be greatly appreciated.
Chris
I'm not really clear on what you're trying to do, but node doesn't depend on the http stack at all. If you just want to start node and have it process something, that is pretty straightforward. Your app.js could be as simple as:
var queueWorker = require('worker');
var startWorker = function() {
if(queueWorker.hasWork()) {
queueWorker.processQueue(startWorker);
} else {
setTimeout(startWorker, 1000);
}
};
startWorker();
What this is doing is setting up a worker loop where every second it will check to see if there is new work, and if there is start processing it. Once it is done processing the work, go back to the 1 second interval checking for new work.
You would have to create the worker module as the check for hasWork and the processing of said work is application dependent.
If you wanted to get a little more fancy, processQueue could spawn a new node process which is only responsible for actually processing the work, then you could keep track of the number of spawned workers versus CPU limitations and have a relatively simple node app which processes data on multiple threads.

Resources