node.js golang composite architecture for web application - node.js

I am currently architecting a web app that will use node.js for basic routing. Some parts of the app are more processor intensive and I wanted to use golang for those parts. However, I'm not sure the best way to install and communicate between the two languages. I'm using Amazon Elastic Beanstalk for initial tests, so any specifics can be targeted for that platform.
In essence it boils down to the following 2 questions:
1) How do you install both node.js and a golang docker image on Amazon EC2? Amazon has guides for one or the other, but not both.
2) What is the best way to offload processor intensive tasks from node.js to a golang codebase (I could imaging RPC, or just running golang on some localhost port, but I'm new to this type of thing)? The golang tasks might be things like serious number crunching or complex graph searches.
Thanks for any guidance.

Go is trivial to deploy. Just build it on a linux box (or use gox) and deploy the binary. (You don't need go installed on the server to run a go program)
There are many options for communicating between Go and Node.js. Here are a few:
If the work you are doing takes a long time it may not be appropriate to have the user wait for a response. For background tasks you can use a queue (like Redis' rpoplpush or a real queue like Kafka or RabbitMQ, or since you're using Amazon: SQS). Push your job as a JSON object onto the queue, then write a Go program that pulls from the queue, does its processing and then writes the final result somewhere.
Go has a jsonrpc library. You can communicate over TCP, serialize a request in Node, read it in Go, then deserialize the response in Node. It's the jsonrpc 1.0 protocol and for TCP all you have to do is add some message framing (prefix your json string with a length) or just newline separate each request / response.
Write a standard HTTP service in Go and just make HTTP calls from NodeJS. (PUT/POST/GET)

Related

How to prioritize express requests/responds over other intensive server related tasks

My node application currently has two main modules:
a scraper module
an express server
The former is very server intensive task which indefinately runs in a loop. It scrapes information from over more than 100 urls, crunches the data and puts it into a mongodb database (using mongoose). This process runs over and over and over. :P
The latter part, my express server, responds to http/socket get requests and returns the crunched data which was written to the db by the scraper to the requesting client.
I'd like to optimize the performance of my server so that the express requests and responds get prioritized over the server intensive task(s). A client should be able to get the requested data asap, without having the scraper eat up all of my server resources.
I though about putting the server intensive task or the express server into its own thread, but then I stumbled upon cluster, and child processes; and now I'm totally confused which approach would be the right one for my situation.
One of the benefits I'm having is that there is a clear seperation between the writing part of my application and the reading part. The scraper writes stuff to the db, express reads from the db (no post/put/delete/...) calls are exposed. So, I -guess- I won't run into threading problems with different threads trying to write to the same db.
Any good suggestions? Thanks in advance!
Resources like cpu and memory required by processes are managed by the operative system. You should not waste your time writing that logic within your source code.
I think you should look at the problem from outside your source code files. Once they ran they are processes. Processes are managed, as I said, by the OS.
Firstly I would split that on two separate commands.
One being the scraper module (eg npm run scraper, that runs something like node scraper.js).
The other one being your express server (eg npm start, that runs something like node server.js).
This approach will let you configure that within your OS or your cluster.
A rapid approach for that will be to use docker.
With two docker containers running your projects with cpu usage limitations. This is fairly easy to do and does not require for you to lift a new server... and at the same time it provides the
isolation level you need to scale it to many servers in the future.
Steps to do this:
Learn a little about docker and docker compose and install them in your server
Build a docker image for your application (you can upload it to a free private image that docker hub gives you for free)
Build a docker compose for your two services using that image, with the cpu configuration you need (you can set both cpu and memory limits easily)
An alternative to that would be running the two commands (npm run scraper and npm start) with some tool like cpulimit, nice/niceness and ionice, or something else like namespaces and cgroups manually (but docker does that for you).
PD: Also, I would recommend to rethink your backend process. Maybe it's better to run it every 12 hours or something like that, instead of all the time, and you may run it from within cron instead of a loop.

Sharing data between node.js app and lua app

I have two applications, a node.js app running on node-webkit, and a lua application. I would need to pass data between the two applications on regulars intervals, say every 5 to 15 seconds.
The node.js application is the one creating the data, and the lua application is the one consuming the data. The data only goes to one direction.
How should I do the data transfer. I would prefer json/xml for the data, but actually it can be in any other format as well. The data moved at a time is not large. Its just some ten parameters at a time.
My initial thought was to just make the node app act as server and serve the data via rest api, and the lua app just read the page with LuaSocket or such. But is there a better way to do the transfer, if both of the apps reside on same machine? Currently the lua app is running in Windows, but that could change.
My background is in web development, so I'm totally lost when it comes to sharing data between applications. I'm also new to lua. Thanks for any answers.
There are many ways to accomplish such task. I will describe two of them.
The first approach which I like most is using a Remote Queue such as Apache Kafka, Redis, RabbitMQ, or even Zookeeper for small data, alternatively store in a database. All these remote storage systems have very good Node.js modules and all of them can handle JSON and any other data type very well.
Unless this is just a mere test app, it is good to build such fault tolerance into your apps. In your case, imagine if the consumer Lua app goes down, or the opposite, Node.js producer app goes down. You don't want the failure on one app to affect another app. In production environment, it is best to isolate apps and tasks like this. Another advantage of this approach is that one day you may decide to rewrite your consumer in Node.js, Scala, etc. or have multiple consumers in different languages. This doesn't require your server to stop or change. It even doesn't have to know about any changes to the consumer.
So, your production server always pushes data to a remote data store/queue independently, and a consumer server reads then deletes the data from this remote store on its own pace.
If you used a database, you would read the new records, consume them, and once done, remove them from the database. This approach allows you to shutdown the consumer and producer apps independently for any reason like upgrade.
Another approach is to establish a direct network connection from producer server to a consumer server via a TCP. The producer server would be a client pushing data to the consumer server. This can be accomplished with the net build-in module if the apps are on different physical machines. But as you can see, this is less reliable solution because if the consumer goes down, the produce can no longer push the new data in which case you should think what you should do with it: discard or store somewhere. If store somewhere, you end up reimplementing the first approach explained above.

How do I set up routing to multiple instances of a node.js server on one url?

I have a simple node.js server app built that I'm hoping to test out soon. It's single threaded and works fine without any child processing whatsoever. My problem is that the server box has multiple cores and the simplest way I can think to utilize them is by running multiple instances of the server app. However this would require them all to be on the same domain name and so some sort of request routing is required. I personally don't have much experience with servers in general and don't know if this is a task for node.js to perform or some other less complicated program (or more complicated.) If there is a node.js mechanism to solve this, for example, if one running instance can send incoming requests to the next instance, than how would I detect when this needs to happen? Transversely, if I use some other program how will it manage to detect when it needs to start talking to a new instance?
Node.js includes built-in support for managing a cluster of instances of your application to take advantage of multiple cores via the cluster module.

run multiple instances of node.js in parallel

I was thinking about using a reverse proxy to distribute API requests to multiple node.js instances of a REST API. Like this it should be possible to achieve much better overall performance since multiprocessor systems can perfectly run multiple instances on one core each (or similar).
What are common solutions for such a distribution of requests onto multiple node instances and what are important points to take in mind?
First and foremost, you can use the cluster module for running many instances of the same server application. It's important to remember to correctly handle shared state, such as storing sessions in a common database.
This works standalone and you can let your users connect directly to that server, or use e.g. nginx, HAProxy, Varnish or lighttpd in front of your server.

NodeJS + SocketIO: Scaling and preventing single point of failure

So the first app that people usually build with SocketIO and Node is usually a chatting app. This chatting app basically has 1 Node server that will broadcast to multiple clients. In the Node code, you would have something like.
//Psuedocode
for(client in clients){
if(client != messageSender){
user.send(message);
}
}
This is great for a low number of users, but I see a problem with this. First of all, there is a single point of failure which is the Node server. Second of all, the app will slow down as the number of clients grow. What is there to do then when we reach this bottleneck? Is there an architecture (horizontal/vertical scaling) that can be used to alleviate this problem?
For that "one day" when your chat app needs multiple, fault-tolerant node servers, and you want to use socket.io to cross communicate between the server and the client, there is a node.js module that fits the bill.
https://github.com/hookio/hook.io
It's basically an event emitting framework to cross communicate between multiple "things" -- such as multiple node servers.
It's relatively complicated to use, compared to most modules, which is understandable since this is a complex problem to solve.
That being said, you'd probably have to have a few thousand simultaneous users and lots of other problems before you begin to have problems with this.
Another thing you can do, is try to develop your application in a way so that if a connection is lost (which happens all the time anyway), eg. server goes down, client has network issues (eg. mobile user), etc, your application should be able to handle that and recover from such issues gracefully.
Since Node.js has a single event-loop thread, this single point of failure is written into its DNA. Even reloading a server after code changes require this thread to be stopped.
There are however a lot of tools available to handle such failures gracefully. You could use forever; a simple CLI tool for ensuring that a given script runs continuously. Other options include distribute and up. Distribute is a load balancing middleware for Node. Up builds on top of Distribute to offer zero downtime reloads using either a JavaScript API or command line interface:
Further reading I find you just need to use Redis Store with Socket.io to maintain connection references between two or more processes/ servers. These options have already been discussed extensively here and here.
There's also the option of using socket.io-clusterhub if you don't intend to use the Redis store.

Resources