Approaches to build a nodejs backend, use one or several instances? - node.js

I am planning to build a nodejs application as back end of a web application.
This server application will provide a rest api, websockets service and process to scrape some sites with a headless navigation (like zombie.js) each n hours to feed a database.
I would to ask if it's a good approache build all the things in one nodejs instance or if it's better use several nodejs applications for every task.

If you are having small size application (which doesnot require scale in future), you can keep all the stuff Rest, Socket, scraping on the same project).
Note: After scraping if you are processing HTML content, it will take some time to process that in Sync manner. So at that time event loop will be blocked. So Rest API will not handle any request. In this scenario, you can keep Rest and Socket combinely in one project and Scraping thing in another project.
If you are planning to scale in near future, I suggest to Keep Seperate Instance, considering Benefit of SOA in scaling.

In my opinion better way is using different Node.js applications. First one will server your API. Another one will work as a socket server on different port.
About scraper it can be PHP (as well as nodejs) script which will run as a cron job. How to setup cron job you can check this question:
https://askubuntu.com/questions/2368/how-do-i-set-up-a-cron-job
or this tutorial for Ubuntu server: https://help.ubuntu.com/community/CronHowto
I think it will be best approach.

Related

Options for getting a CPU intensive job off my web server?

I have been working on a Web App for visualizing live data. It is crucial that this data is kept up to date on the client side without such updates being invoked directly by the client (e.g. no button presses or refreshing the page). Currently, on page load, I grab the current data set from a database (DynamoDB) via Ajax, and subsequent updates are pushed to any listening clients every 5 minutes via a Websockets connection (using Socket.io).
I have overlooked the computational load of this update job. It has to mine some data, process it, update the database, and send the update out to all clients. As a result, the web server is left unresponsive for about 30 seconds with each update. Furthermore, my current architecture limits me from putting my server behind a load balancer, which is something I anticipate coming up in the future. For both these reasons, I really need to get this update job off my web server.
I am relatively inexperienced in web development, and I don't feel I am knowledgeable enough about these technologies to know the drawbacks of the solutions I have come up with. Currently, I am considering:
Break the update off into a separate process so it does not block the Node event loop. This would solve my issue in the short term, but if I ever want to load balance my application, I can't have the update running on multiple machines.
Drop Websockets entirely and just have the client query the database every 5 minutes, while a separate process (or separate server if I want load balancing) keeps the database up to date without interacting directly with the client. Will this kind of access pattern put too much load on my db?
Have a separate server run the update, and send the result via Websockets (or maybe some other protocol) to my load balanced application servers, which then push that update to all listening clients as usual. Is this even possible?
Perhaps there are other solutions. It seems like this would be a relatively common problem, so I was hoping I could find some guidance here. What are the potential issues with the solutions I have proposed, and are there other possible solutions that my suit my use case better?
It sounds like you want one process sitting somewhere which crunches the data and publishes it to a stream. Clients can then subscribe to the stream as and when they like. Redis handles streams nicely, you could process your data and push it into a redis stream. You could then create a small node service which subscribes to the redis stream and pushes the formatted data out over a websocket or via polling.
In this scenario you can then scale up either the publishing process (the one crunching the numbers) if your data load goes up, or scale up your subscribed process (which serves the data over a websocket to browsers) if you get an influx of clients watching the data.
You can also easily distribute the hosting of these services across other machines, and even write them in different languages if you decide the number crunching needs something like threading.
You're then left with the issue of clients (web browsers) consuming this data with a load balance in-between. This can be a hard problem if you use websockets and is bundled with pros and cons. But importantly you'll have separated your data crunching from your result publishing and that'll isolate out your issue to only the load balancing.
I have done pretty much the same to check ressources on some of our servers.
I have a C# service getting the information on each server that we manage, sending them to a queue (Amq).
From there, I have a stomp client fetching data from amq and emiting them to a websocket.
My main micro service is fetching the data to save them into a db.
My visualisation webapp is connected to the same ws and is fetching the data as they are sent to display them.
The Amq step isn't mandatory at all, it's just something I had to work with (historical).
I don't know what type of data your are working with, so I don't know if my solution can apply to you.
Don't hesitate if I'm not clear or you have any question.
This is a big question and I'm not going to try and give you a definitive answer.
For option 2
It really depends on how expensive your queries are. You can make DynamoDB fast if you pay for enough throughput. That said, on the face it, re-loading your whole dataset, when that sounds like its probably large, probably isn't good engineering.
For option 3
This option seems best to me if its achievable, although admittedly its hard to say with such a complex system - obviously you can't share your whole project.
Given your are already using AWS you might want to look into AWS Lambda. If you can move the update process into a stand alone job, you can host it on lambda and move the load off the web server. Lambda is essentially infinitely scalable and you only pay for the compute you use.
This really depends on you being able to split the update task off into a separate service. Its likely you would need a fair bit of refactoring to isolate it as a service. If you can break little bits off at a time, and make the move gradually, even better.
If you consider trying this, and you've not used Lambda before, I would definitely start small with some hello world examples. Then try a very simple service in your application, and build up to taking on the update service.
You might also consider looking in AWS Simple Message Queue Service to handle the comms between clients and server.
Database tuning
If a lot of your update time is spent waiting for database actions to complete, rather than server processing, you can consider tuning that side of things up. Things to consider are:
Buying more throughput
Using batch operations (as these move load to DynamoDB from your server)
Tuning keys, indexes and database access

Scalability of server-side application, best approach

I'm programming a server-side application which will manage requests from:
Game client
Website (HTTP requests)
API
As of now I'm using only one (NodeJS) application for every type of requests, the problem is that with a growing user base, this approach will generate a bottle-neck.
I would like some advices on how to develop the server-side architetture so that it'll be scalable.
The only solution that I know of is to use multiple servers with the same application running which will share the same memory (Redis server).
Is it possible in nodeJS to split the management of these types of request into multiple servers? Maybe one or more servers for each type of request?
Currently I'm using:
NodeJS
Redis
MySQL
Express
Socket.io
Thanks in advance, can you recommend some books on this matter?
On one machine to handle the power of multicore architecture you can use node.js Cluster module (https://nodejs.org/api/cluster.html).
I think it is a good idea to split API and Website on different applications. If you decide to run multiple node.js applications on one machine try to use pm2 (http://pm2.keymetrics.io/). You could probably split your API on a bunch of small applications - which called microservice architecture. I personally don't like microservice approach you can check the web for pros and cons.
Also if you deploy your application (or bunch of applications) on different virtual/phisycal machines (which is usual in production) you can use haproxy for balancing and
fault tolerance (http://www.haproxy.org/).

Separate node apps or combined?

I am trying to plan out the best way to approach developing a node application, and am not sure what would provide the best performance. A little bit of info on the overall plan: the entire project will involve a web app as well as a 'bot' app. The bot app in question is node-steam, which is quite a substantial application on its own. My question is whether I should run two separate node processes for each app (one for web server and one for node-steam), or code them into one combined node process?
Also please note that I will need for the web app to be able to communicate with node-steam. I am planning on integrating socket.io into node-steam to make calls to it via web app actions. Is this the best approach if I keep the apps as separate node processes?
EDIT: When I refer to letting the web app communicating with node-steam, I meant that there are functions which need to be triggered in node-steam when a user does something in the web app (namely they perform specific actions in the browser), so I am planning on doing this via socket directly to the node-steam app, rather than to the web app and then routing the calls on to node-steam. As far as I can tell this is the simplest way of doing it.
Any guidance is appreciated.
Thank you
If the bot app is quite substantial, you should code it for scalability. You probably will be spawning worker threads or even scale across multiple nodes in no time, for the bot app alone.

Do I need 2 separate applications in Node.js, one for visitors and one for CRUD admins?

Intro
I'm trying out Node.js right now ( coming from PHP background).
I'm already catching the vibe of the workflow with it (events, promises, inheritance..haven't figured out streams yet).
I've chosen a graphic portfolio web app as my first nodejs project. I know node.js might not fit best for this use case but I it's a good playground and I need to do this anyway.
The concept:
The visitors will only browse through pretty pictures in albums, no
logging in or subscirptions, nothing.
The administrators will add,modify, reorder.. CRUD the photo
albums. So I need there Auth, ACL, Validation, imagemagick... a lot
more than just on the frontend.
Currently I'm running one instance of Node.js, so both admin and visitor code is in one shared codebase and shared node memory runtime, which to me looks unnecessary performance-wise.
Question
For both performance and usability:
Should I continue running one instance of node for both admin and visitor areas of the web app or should I run them as 2 separate instances? (or subtasks? - honestly i haven't worked with subtasks/child processes)
Ideas floating around
Use nginx as proxy if splitting into 2 applications
Look into https://stackoverflow.com/a/5683938/339872. There's some interesting
mentions of tools to help manage processes.
I would setup admin.mysite.com and have that hosted on another server...or use nginx to proxy requests from that domain to your admin.js node app.

Node.js and Socket.io - how far can they go with real time web applications?

I am going to build a web application to manage notes (think of something similar to Evernote). I have decided to use Backbone.js+JQuery client side.
Server-side, I haven't decided yet: either pure PHP (that I know very well) or Node.js+Socket.io (completely new to me).
I am considering Node.js+Socket.io because I would like to have my web application being real-time (i.e: if a user updates a note, that note gets updated instantly for a collaborator who shares that note, without reloading the page).
I was also considering, as a third alternative, to use Node.js and Socket.io for the UI and PHP for the REST API (I feel more comfortable to build an API with PHP). The PHP and Javascript code will share the MongoDB database.
My question is this: if I develop the REST API for my web application with PHP and a new note for the user gets created through the API (i.e.: the Android app sends an API request to create that note), will Node.js, Socket.it and Backbone.js be able to instantly update the UI of the user and show the new note on their screen? I think that can be called "push notification".
I hope I was clear enough.
Also, is there any alternative outstanding technology to build real time web applications?
Yes Node.js + Socket.IO will do a very good job of this. Node uses an event-loop, this means upon a request it is entered into a queue. Node deals with these requests one-by-one. Traditional web servers deal with a 'Thread-per-request' approach where a thread is created to handle that requests.
The benefit of Node here is that it doesn't need to context switch so often, this means it can deal with these requests very quickly... most likely faster than your PHP server. However Node runs as a single process, on a single CPU core. If your application is CPU intensive it could be that it blocks, meaning the time for each requests will be slower.
However it sounds to me like your application isn't CPU intensive, meaning Node.js will work well.
Decision
If your time is limited, and you don't want to learn a new skill (Node), PHP will be fine. If you have the time I recommend learning Node.js, as it is very strong when it comes to I/O intensive tasks such as a REST API for creating Notes.
Updating the UI
If your intended use is through a mobile device, I recommend using WebSockets but having a fallback such as long-polling. It is possible to update the Client UI using either Node or PHP. However from my experience it is much easier to do so using Socket.IO on Node.js.
Example Updating the client using Node.js / Socket.io
Client-side
socket.on('new-note', function (data) {
placeNewNote(data);
});
Server-side
socket.emit('new-note', data);
Getting Started With Node:
How do I get started with Node.js
Please also note, if you want to build a native Android mobile app which uses WebSockets... you will need to use: Java socket.io client
Using Node.js for both web server and push server is of course the best way. Especially since if you are going to use Node.js anyway then you have to learn it, so learning how to make a web server is only natural (I advice using the most famous Express framework).
Now you can use PHP for web server and Node.js for push server. To make them communicate with each other you would probably want to add Redis to your application. Redis will allow you to push notifications to any client connected to it, like PHP server or Node.js push server (and it scales well). From that point push server will push the data further to the client's browser.
Alternative technology would be for example Twisted server. Of course you'll need to learn Python in order to use it. And I don't know whether it supports WebSockets correctly. I think you should stick with Node.js + socket.io.

Resources