What are common development issues, pitfalls and suggestions? - node.js

I've been developing in Node.js for only 2 weeks and started re-creating a web site previously written in PHP. So far so good, and looks like I can do same thing in Node (with Express) that was done in PHP in same or less time.
I have ran into things you just have to get used to such as using modules, modules not sharing common environment, and getting into a habit of using callbacks for file system and database operations etc.
But is there anything that developer might discover a lot later that is pretty important to development in node? Issues that everyone else developing in Node has but they don't surface until later? Pitfalls? Anything that pros know and noobs don't?
I would appreciate any suggestions and advice.

Here are the things you might not realize until later:
Node will pause execution to run the garbage collector eventually/periodically. Your server will pause for a hiccup when this happens. For most people, this issue is not a significant problem, but it can be a barrier for building near-time systems. See Does Node.js scalability suffer because of garbage collection when under high load?
Node is single process and thus by default will only use 1 CPU. There is built-in clustering support to run multiple processes (typically 1 per CPU), and for the most part the Node community believes this to be a solid approach. You might be surprised by this reality, though.
Stack traces are often lost due to the event queue, so your logging and debugging methodology needs to change significantly
Here are some minor stumbling blocks you may run into for a while (I still bump up against these)
Remembering to do callback(null, value) on a successful callback. Passing null as a first parameter is weird and thus I forget to do it. Instead I accidentally do callback(value), which is interpreted as an error by the caller until I debug into it for a while and slap my forehead.
forgetting to use return when you invoke the callback in a guard clause and don't want a function to continue to execute past that point. Sometimes this results in the callback getting invoked twice, which causes all manner of misbehavior.
Here are some NICE things you might not realize initially
It is much easier in node.js, using one of the awesome flow control libraries, to do complex operations like loading 3 network resources in parallel, then making 2 DB calls in serial, then writing to 2 log files in parallel, then sending an HTTP response. This stuff is trivial and beautiful in node and damn near impossible in many synchronous environments.
ALL of node's modules are new and modern, and for the most part, you can find a beautifully-designed module with a great API to do what you need. Python has great libraries by now, too, but compare Node's cheerio or jsdom module to python's BeautifulSoup and see what I mean. Compare python's requests module to node's superagent.
There's a community benefit that comes from working with a modern platform where people are focused on modern web development. The contrast between the node community and the PHP community cannot be overstated.

Related

Node: port some JSON processing logic to wasm, worth?

In Node, does it make sense to port some logic to wasm for json manipulation operations in order for the algorithm to be quicker?
tl;dr Don't use WASM for JSON manipulation.
Setting up the WASM tooling is no small thing, and will likely require more ongoing troubleshooting and maintenance than lighter-weight solutions.
Wasm generally comes into its own when you need to do truly compute-intensive stuff like customized signal (image/video/audio) processing for your browser users or natively inside server-size Javascript environments.
If your purpose in doing this is avoiding overhead in JSON processing, you might consider looking for fast JSON packages on npm. fast-json-stable-stringify is an example.
You should also keep this in mind: V8 -- the open-source Javascript engine in nodejs, deno, and chromium -- is under active development by a large and highly competent team at Google. That engine includes the JSON. functions. Performance improvements show up in every release, and they're highly motivated to keep making things faster. JSON handling is one of those areas. If you roll your own, your code won't get any faster with the next release of V8/nodejs. If you use the stuff in V8, it might.
Take a look at this dev summit presentation, for example. Maybe you can outsmart the V8 team. If so you should join them.
So, if you want to learn WASM, do something that will add interesting new capabilities to your personal toolbox.
Unlikely. The V8 engine has a pretty good JIT compiler. If it is a hot path and properly written, it will become machine code anyway.

hesitation between two technologies for a little program

I want to make a program (more precisely, a service) that periodically scans directories to find some video files (.avi, .mkv, etc) and automatically download some associated files (mostly subtitles) from one or several websites.
This program could run on linux or windows as well.
On one hand, I know well Qt from a long time and I know all its benefits, but on the other hand, I'm attracted by node.js and it extreme flexibility and liveliness.
I need to offer some interactivity with the end user of my program (for instance, chose the scans directories, etc).
What would be the best choice in your opinion in 2013?
I advise against Node.js for "small tools and programs". Especially for iterative tasks.
The long story
The reason is quite simply the way Node.js works. Its asynchronous model makes simple tasks unnecessarily convoluted. Additionally, because many callbacks are called from the Node.js event loop, you can't just use try/catch structures so every tiny error will crash your whole Application.
Of course there are ways to catch those errors or work with them, but the docs advise you against all of them and advise you to restart the application gracefully in any case to prevent memory leaks. This means you have to implement yet another piece of code.
The only real solution in Node.js would be writing your Application as a Cluster, which is a great concept but of course would require you to use some kind of IPC to get your data back to a process that can handle it.
Also, since you wrote about "periodically scan"ning a directory, I want to point out that you should...
Use file system watchers for services
Almost every language kit has those now and I strongly suggest using those and only use a fallback full-scan.
In Qt there is a system-independent class QFileSystemWatcher that provides a handy callback whenever specified files are changed
In Java there is the java.nio.file.FileSystem.getWatchService()
Node.js has the fs.watch function, if you really want to go for it

Simulate website load on Node.JS

I am thinking of creating my own simple load test, where I can hit my website with multiple requests (like 100-1000 concurrent users) to see how it performs. I want to try Node.js out, but I don't know if it is the wrong technology for the job, since Node.js don't use threads?
Can I with the async model that Node.js uses, simulate the many user requests, or would that be more appropiate to do in another language like Ruby/.NET/Python?
Node.js ought to be perfect for the task. I do this at work. The one crucial piece that you will have to change is the http socket pool. The following code snipped will disable pooling entirely, letting you starve your Node.js process if you want to.
var http = require('http');
var req = http.request(..., agent: false)
You can read about this more at the http.Agent documentation.
Your concern about threads is astute, but even if you hit that limit (Node is very good at keeping your resources efficient) the solution is simple: start multiple instances (processes) of your load test. As it is, you may have to use multiple machines entirely to correctly simulate load.
In any case, you will not win automatically by using Ruby or Python for this. Asynchronous programming is ideal for I/O and network-bound tasks, and Node excels at this. Similarly, while Ruby and Python have third-party asynchronous frameworks, they're by definition more obscure than the standard asynchronous framework given in Node.
Node can fire off pretty much as many requests as you want it to (though you may have to change the defaults for http:Agent). You're more likely to be limited by what your OS can do than by anything inherent in node (and of course such limitations will apply in any other language you use).
It's simple to create load tests with nodeload.

nodejs job server (multiple purpose)

I'm fairly new and just getting to know node.js (background as PHP developer). I've seen some nodeJs examples and the video on nodejs website.
Currently I'm running a video site and in the background a lot of tasks have to be executed. Currently this is done by cronjobs that call php scripts. The downsite of this approach is when an other process is started when the previous is still working you get a high load on the servers etc.
The jobs that needs to be done on the server are the following:
Scrape feeds from websites and insert them in mysql database
Fetch data from websites (scraping) (upon request)
Generate data for reporting. These are mostly mysql queries that need to be executed.
Tasks that need to be done in the future
Log video views (when a user visits a video page) (this will also be logged to mysql)
Log visitors in general
Show ads based on searched video
I want to be able to call an url so that a job can be queued and also be able to schedule jobs by time or they can run constantly.
I don't know if node.js is the path to follow that's why I'm asking it here. What are the advantages of doing this in node? The downsites?
What are the pro's here with node.js?
Thanks for the response!
While traditionally used for web/network tasks (web servers, IRC chat servers, etc.), Node.js shines when you give it any kind of IO bound (as opposed to CPU bound) task, since it uses completely asynchronous IO (that is, all IO happens outside of the main event loop). For example, Node can easily hold open many sockets, waiting for data on each, or stream data to and from files very efficiently.
It really sounds like you're just looking for a job queue; a popular one is Resque, and though it's written for Ruby, there are versions for PHP, Node.js, and more. There are also job queues built specifically for PHP; if you want to stick to PHP, a Google search for "PHP job queue" make take you far.
Now, one advantage to using Node.js is, again, its ability to handle a lot of IO very easily. Of course I'm just guessing, but based on your requirements, it could be a good tool for the job:
Scrape data/feeds from websites - mostly waiting on network IO
Insert data into MySQL - mostly waiting on network IO
Reporting - again, Node is good at MySQL queries, but probably not so good at analyzing data
Call a URL to schedule a job - Node's built-in HTTP handling and excellent web libraries make this a cinch
So it's entirely possible you may want to experiment with Node for these tasks. If you do, take a look at Resque for Node or another job system like Kue. It's also not very hard to build your own, if you don't need something complicated--Redis is a good tool for this.
There are a few reasons you might not want to use Node. If you're not familiar with JavaScript and evented and continuation-passing style programming, Node.js may have a bit of a learning curve, as you have to force yourself to stop thinking synchronously. Furthermore, if you do have a lot of heavy non-IO tasks in your program, such as analyzing data, Node will not excel as those calculations will block the main event loop and keep Node from handling callbacks, etc. for your asynchronous IO. Finally, if you have a lot of logic already in PHP or another language, it may be easier and/or quicker to find a solution in your language of choice.
I second the above answers. You don't necessarily need a full-service job queue, however: you can use flow-control modules like async to run tasks in parallel or series, as fast as they'll go or with controlled concurrency.
Node.js has many powerful scraping/parsing tools. This post mentions a few; I just heard about Trumpet recently; there are probably dozens of options. Node.js has a Stream module in core and Request makes HTTP interactions extremely easy.
For timed tasks, the simplest approach is a basic setTimeout/setInterval. Or you could write the scraper as a script that's called on cron. Or have it triggered on some event using the EventEmitter module in core. etc...
Uncontrolled amount of node js parallel jobs may lay down your server. You will need to control processes or in better way put them in queue for each task
For this needs and if you know php I suggest to use gearman and add jobs by needs or by small php scripts

Node.js event vs thread programming on server side

We are planning to start a fairly complex web-portal which is expected to attract good local traffic and I've been told by my boss to consider/analyse node.js for the serve side.
I think scalability and multi-core support can be handled with an Nginx or Cherokee in front.
1) Is this node.js ready for some serious/big business?
2) Does this 'event/asynchronous' paradigm on server side has the potential to support the heavy traffic and data operation ? considering the fact that 'everything' is being processed in a single thread and all the live connections would be lost if it got crashed (though its easy to restart).
3) What are the advantages of event based programming compared to thread based style ? or vice-versa.
(I know of higher cost associated with thread switching but hardware can be squeezed with event model.)
Following are interesting but contradicting (to some extent) papers:-
1) http://www.usenix.org/events/hotos03/tech/full_papers/vonbehren/vonbehren_html
2) http://pdos.csail.mit.edu/~rtm/papers/dabek:event.pdf
Node.js is developing extremely rapidly, and most of its functionality is sturdy and ready for business. However, there are a lot of places where its lacking, like database drivers, jquery and DOM, multiple http headers, etc. There are plenty of modules coming up tackling every aspect, but for a production environment you'll have to be careful to pick ones that are stable.
Its actually much MUCH more efficient using a single thread than a thousand (or even fifty) from an operating system perspective, and benchmarks I've read (sorry, don't have them on hand -- will try to find them and link them later) show that it's able to support heavy traffic -- not sure about file-system access though.
Event based programming is:
Cleaner-looking code than threaded code (in JavaScript, that is)
The JavaScript engine is extremely efficient with processing events and handling callbacks, and its easily one of the languages seeing the most runtime optimization right now.
Harder to fit when you are thinking in terms of control flow. With events, you can never be sure of the flow. However, you can also come to think of it as more dynamic programming. You can treat each event being fired as independent.
It forces you to be more security-conscious when programming, for the above reason. In that sense, its better than linear systems, where sometimes you take sanitized input for granted.
As for the two papers, both are relatively old. The first benchmarks against this, which as you can see, has a more recent note about these studies:
http://www.eecs.harvard.edu/~mdw/proj/seda/
It also cites the second paper you linked about what they have done, but refuses to comment on its relevance to the comparison between event-based systems and thread-based ones :)
Try yourself to discover the truth
See What is Node.js? where we cover exactly that:
Node in production is definitely possible, but far from the "turn-key" deployment seemingly promised by the docs. With Node v0.6.x, "cluster" has been integrated into the platform, providing one of the essential building blocks, but my "production.js" script is still ~150 lines of logic to handle stuff like creating the log directory, recycling dead workers, etc. For a "serious" production service, you also need to be prepared to throttle incoming connections and do all the stuff that Apache does for PHP. To be fair, Rails has this exact problem. It is solved via two complementary mechanisms: 1) Putting Rails/Node behind a dedicated webserver (written in C and tested to hell and back) like Nginx (or Apache / Lighttd). The webserver can efficiently serve static content, access logging, rewrite URLs, terminate SSL, enforce access rules, and manage multiple sub-services. For requests that hit the actual node service, the webserver proxies the request through. 2) Using a framework like "Unicorn" that will manage the worker processes, recycle them periodically, etc. I've yet to find a Node serving framework that seems fully baked; it may exist, but I haven't found it yet and still use ~150 lines in my hand-rolled "production.js".

Resources