Sharing data between node.js app and lua app - node.js

I have two applications, a node.js app running on node-webkit, and a lua application. I would need to pass data between the two applications on regulars intervals, say every 5 to 15 seconds.
The node.js application is the one creating the data, and the lua application is the one consuming the data. The data only goes to one direction.
How should I do the data transfer. I would prefer json/xml for the data, but actually it can be in any other format as well. The data moved at a time is not large. Its just some ten parameters at a time.
My initial thought was to just make the node app act as server and serve the data via rest api, and the lua app just read the page with LuaSocket or such. But is there a better way to do the transfer, if both of the apps reside on same machine? Currently the lua app is running in Windows, but that could change.
My background is in web development, so I'm totally lost when it comes to sharing data between applications. I'm also new to lua. Thanks for any answers.

There are many ways to accomplish such task. I will describe two of them.
The first approach which I like most is using a Remote Queue such as Apache Kafka, Redis, RabbitMQ, or even Zookeeper for small data, alternatively store in a database. All these remote storage systems have very good Node.js modules and all of them can handle JSON and any other data type very well.
Unless this is just a mere test app, it is good to build such fault tolerance into your apps. In your case, imagine if the consumer Lua app goes down, or the opposite, Node.js producer app goes down. You don't want the failure on one app to affect another app. In production environment, it is best to isolate apps and tasks like this. Another advantage of this approach is that one day you may decide to rewrite your consumer in Node.js, Scala, etc. or have multiple consumers in different languages. This doesn't require your server to stop or change. It even doesn't have to know about any changes to the consumer.
So, your production server always pushes data to a remote data store/queue independently, and a consumer server reads then deletes the data from this remote store on its own pace.
If you used a database, you would read the new records, consume them, and once done, remove them from the database. This approach allows you to shutdown the consumer and producer apps independently for any reason like upgrade.
Another approach is to establish a direct network connection from producer server to a consumer server via a TCP. The producer server would be a client pushing data to the consumer server. This can be accomplished with the net build-in module if the apps are on different physical machines. But as you can see, this is less reliable solution because if the consumer goes down, the produce can no longer push the new data in which case you should think what you should do with it: discard or store somewhere. If store somewhere, you end up reimplementing the first approach explained above.

Related

node.js golang composite architecture for web application

I am currently architecting a web app that will use node.js for basic routing. Some parts of the app are more processor intensive and I wanted to use golang for those parts. However, I'm not sure the best way to install and communicate between the two languages. I'm using Amazon Elastic Beanstalk for initial tests, so any specifics can be targeted for that platform.
In essence it boils down to the following 2 questions:
1) How do you install both node.js and a golang docker image on Amazon EC2? Amazon has guides for one or the other, but not both.
2) What is the best way to offload processor intensive tasks from node.js to a golang codebase (I could imaging RPC, or just running golang on some localhost port, but I'm new to this type of thing)? The golang tasks might be things like serious number crunching or complex graph searches.
Thanks for any guidance.
Go is trivial to deploy. Just build it on a linux box (or use gox) and deploy the binary. (You don't need go installed on the server to run a go program)
There are many options for communicating between Go and Node.js. Here are a few:
If the work you are doing takes a long time it may not be appropriate to have the user wait for a response. For background tasks you can use a queue (like Redis' rpoplpush or a real queue like Kafka or RabbitMQ, or since you're using Amazon: SQS). Push your job as a JSON object onto the queue, then write a Go program that pulls from the queue, does its processing and then writes the final result somewhere.
Go has a jsonrpc library. You can communicate over TCP, serialize a request in Node, read it in Go, then deserialize the response in Node. It's the jsonrpc 1.0 protocol and for TCP all you have to do is add some message framing (prefix your json string with a length) or just newline separate each request / response.
Write a standard HTTP service in Go and just make HTTP calls from NodeJS. (PUT/POST/GET)

Faye clustering multiple nodes NodeJS

I am trying to make a pub/sub infra using faye (nodejs). I wish to know whether horizontal scaling would be possible or not.
One nodejs process will run on single core, so when people are talking about clustering, they talk about creating multiple processes on the same machine, sharing a port, and sharing data through redis.
Like this:
http://www.davidado.com/2013/12/18/using-node-js-cluster-with-socket-io-for-push-notifications/
Firstly, I don't understand how we make sure that each of the forked processes goes to a different core. If I fork 10 node servers on a machine with 4 cores, is it taken care that they are equally distributed?
What if I wish to add is a new machine, and thus scale it. I have not seen any such support anywhere. I am not sure if it is even possible to do it.
Let's say somehow multiple nodes are being used and there is some load balancer. But one client will connect to only one server process. So when a client C1 publishes on a channel on which a client C2 has subscribed, and C1 is connected to process P1 and C2 is connected to process P2, how will P1 publish the message to C2 when it doesn't have the connection?
This would probably be possible in case of a single machine, because the cluster module enables all processes to share the same port and the connections too.
I am fairly new to the web world, as well as nodejs and faye. Please enlighten me if there is something wrong in the question.
You are correct in thinking that the cluster module allows multiple cores to be used on a single machine. The cluster module allows the same application to be spawned multiple times whilst listening to the same port. The distribution amongst the cores is down to the operating system, so if you have 10 processes and 4 cores then the OS will figure out how best to distribute them (as long as they haven't been spawned with a set affinity). By default this shouldn't be a concern for you.
Load-balancing can be done through node too but that is separate from clustering. Instead you would have a separate application that would grab the load statistics on each running server and proxy the http request to the most appropriate server (using http-proxy as an example). A very primitive load balancer will send one request to each running server instance incrementally to give an even distribution.
The final point about sharing messages between all the instances assumes that there is a single point where all the messages are held. In the article you linked to they assume that there is only one server and all the processes share access to the redis instance. As they all access the same redis instance, all processes will be able to receive the same messages. If we're going to start thinking about multiple servers that are in different locations in the world that all have different message stores (i.e. their own redis instances) then we get into the domain of 'replication'. Some data stores are built with this in mind and redis is one of them. You end up with a 'master' set of data and a set of 'slaves' that will periodically update with the master and grab anything they are missing. It is important to note here that messages will not be sent in 'real-time' here unless you have a very intensive replication process.
In conclusion, developers go through this chain of scaling for their applications. The first is to make the application multi-process (the cluster module). The second is to have a load balancer that proxies the http request to the appropriate server that is running the multi-process application. The third is to replicate the datastores so that the servers can run independently but keep in sync with each other.

Connecting Node.js applications running in different servers

It is not uncommon to think about distributing the logic of an application between different servers whether because of scalability, security or any other arbitrary concern. In such a scenario it's important to have reliable channels of communication between the separate modules or applications.
A practical case could look like this:
(Server #1) You have a DB table filling up with tasks (in the form of table entries) that need to be processed.
(Server #2) You have an arbitrator that fetches these tasks one by one so as to handle them in some specific fashion.
(Server #3 -- #n) You have multiple worker applications that receive tasks from the arbitrator and return the results back to it.
Now imagine that everything is programmed with Node.js. You want the worker servers to be able to spawn when more resources are needed and be terminated when the processing load is low. When a worker node is created it has to connect back to the arbitrator to signal that it is ready to receive tasks.
What are the available options for communicating the worker nodes with the arbitrator such that the arbitrator can detect when a new worker node is connecting to it and data between both can start to flow. Or, in other words, how to go about creating reliable state-full channels of communication between two remote Node.js applications?
As much as this shouldn't turn into a battle of messaging technologies, another option is RabbitMQ. They have quick tutorials for both worker queues and remote procedure calls (rpc).
Although these tutorials are in python, they are still easy to follow though (and I believe a bit of googling will find you Node translations on github).
In your situation, Rabbit will be able to handle dispatching messages to particular workers, however I think you will have to write your scaling logic yourself.
zeromq is a good option for that.

How do I set up routing to multiple instances of a node.js server on one url?

I have a simple node.js server app built that I'm hoping to test out soon. It's single threaded and works fine without any child processing whatsoever. My problem is that the server box has multiple cores and the simplest way I can think to utilize them is by running multiple instances of the server app. However this would require them all to be on the same domain name and so some sort of request routing is required. I personally don't have much experience with servers in general and don't know if this is a task for node.js to perform or some other less complicated program (or more complicated.) If there is a node.js mechanism to solve this, for example, if one running instance can send incoming requests to the next instance, than how would I detect when this needs to happen? Transversely, if I use some other program how will it manage to detect when it needs to start talking to a new instance?
Node.js includes built-in support for managing a cluster of instances of your application to take advantage of multiple cores via the cluster module.

NodeJS + SocketIO: Scaling and preventing single point of failure

So the first app that people usually build with SocketIO and Node is usually a chatting app. This chatting app basically has 1 Node server that will broadcast to multiple clients. In the Node code, you would have something like.
//Psuedocode
for(client in clients){
if(client != messageSender){
user.send(message);
}
}
This is great for a low number of users, but I see a problem with this. First of all, there is a single point of failure which is the Node server. Second of all, the app will slow down as the number of clients grow. What is there to do then when we reach this bottleneck? Is there an architecture (horizontal/vertical scaling) that can be used to alleviate this problem?
For that "one day" when your chat app needs multiple, fault-tolerant node servers, and you want to use socket.io to cross communicate between the server and the client, there is a node.js module that fits the bill.
https://github.com/hookio/hook.io
It's basically an event emitting framework to cross communicate between multiple "things" -- such as multiple node servers.
It's relatively complicated to use, compared to most modules, which is understandable since this is a complex problem to solve.
That being said, you'd probably have to have a few thousand simultaneous users and lots of other problems before you begin to have problems with this.
Another thing you can do, is try to develop your application in a way so that if a connection is lost (which happens all the time anyway), eg. server goes down, client has network issues (eg. mobile user), etc, your application should be able to handle that and recover from such issues gracefully.
Since Node.js has a single event-loop thread, this single point of failure is written into its DNA. Even reloading a server after code changes require this thread to be stopped.
There are however a lot of tools available to handle such failures gracefully. You could use forever; a simple CLI tool for ensuring that a given script runs continuously. Other options include distribute and up. Distribute is a load balancing middleware for Node. Up builds on top of Distribute to offer zero downtime reloads using either a JavaScript API or command line interface:
Further reading I find you just need to use Redis Store with Socket.io to maintain connection references between two or more processes/ servers. These options have already been discussed extensively here and here.
There's also the option of using socket.io-clusterhub if you don't intend to use the Redis store.

Resources