heroku: route subdirectory to a second node.js app? - node.js

I have a heroku node.js app running under the domain foo.com. I want to proxy all urls beginning with foo.com/bar/ to a second node.js process - but I want the process to be controlled within the same heroku app. Is this possible?
If not, is it possible to proxy a subdirectory to a second heroku app? I haven't been able to find much control over how to do routing outside of the web app's entry point. That is, I can easily control routing within node.js using Express for example, but that doesn't let me proxy to a different app.
My last resort is simply using a subdomain instead of a subdirectory, but I'd like to see if a subdirectory is possible first. Thanks!
Edit: I had to solve my problem using http-proxy. I have two express servers listening on different ports and then a third externally facing server that routes to either of the two depending on the url. Not ideal of course, but I couldn't get anything else to work. The wrap-app2 approach described below had some url issues that I couldn't figure out.

Just create a new express server and put a middleware in the main one to redirect to the secondary when comes a request to your desired path:
var app2 = express();
app2.use(function(req, res){
res.send('Hey, I\'m another express server');
});
app.use('/foo', app2);
I haven't tried it yet in Heroku, but it the same process and doesn't create any new TCP binding or process, so It will work. For reference, a modified plain express template.
And if you really want other express process handling the connection, you need to use cluster. Check the worker.send utility.
app.use('/foo', function(req,res){
//You can send req too if you want.
worker.send('foo', res);
});

This is possible. The most elegant way I could think is by using clustering. 1 Heroku Dyno contains four cores. Therefore, you can run four worker threads to a node process.
Here is an introduction to clustering.
What you're looking at is initializing two express apps (assuming you're using express) and serving those two in two worker threads.
if (cluster.isMaster) {
// let's make four child processes
for (var i = 0; i < 4; i++) {
if (i%2 == 0) {
cluster.fork(envForApp1);
} else {
cluster.fork(envForApp2);
}
}
} else {
// refer to NODE_ENV and see whether this should be your app1 or app2
// which should be started. This is passed from the fork() before.
app.listen(8080);
}

Related

Socket.io + REST API + REACT - is it better to separate socket.io from REST API

My question could be flagged as "opinion based" but I am wondering which approach is the best for my application as I am able to do it in both ways.
I am building chat application in which users and conversations are saved in MongoDB. I will have my react application consuming API/APIs. The question is - is it better to have REST API and Socket.io applications running separate? For example:
Have REST API running on port 3005
Have Socket.io running on port 3006
React Application consuming these 2 separately and basically they will not know about each other. My endpoints in REST API endpoints and socket.io will be invoked only in front-end.
On the other hand, I can have my socket.io application and REST API working together in 1 big application. I think it is possible to make it working without problems.
To sum up, at first glance I would take the first approach - more cleaner and easy to maintain. But I would like to hear other opinions or if somebody had a similar project. Usually how the things are made in this kind of projects when you have socket.io and REST API?
I would check the pros and cons for both scenario. For example code and resource reusability is better if you have a single application and you don't have to care about which versions are compatible with each other. On the other hand one error can kill both applications, so from security perspective it is better to have separate applications. I think the decision depends on what pros and cons are important to you.
you can make a separate file for socket.io logic like this:
// socket.mjs file
import { Server } from "socket.io"
let io = new Server()
const socketApi = {
io: io
}
io.on('connection',(socket)=>{
console.log('client connected:', socket.id)
socket.join('modbus-room')
socket.on('app-server', data=>{
console.log('**************')
console.log(data)
io.to('modbus-room').emit('modbus-client', data)
})
socket.on('disconnect',(reason)=>{
console.log(reason)
})
})
export default socketApi
and add it to your project like this:
// index.js or main file
//...
import socketApi from "../socket.mjs";
//...
//
/**
* Create HTTP server.
*/
const server = http.createServer(app);
socketApi.io.attach(server);
//

NodeJS Express - Two NodeJS instances on same port (vhost)

I'm trying to run 2 instances of NodeJS on the same port and server from diffrent server.js files (diffrent dir, config etc). My server provider gave me an information that vhost is running for a diffrent domain, and there is the question. How to handle it in NodeJS Express app ? I've tried to use vhost from https://github.com/expressjs/vhost like that :
const app = express();
const vhost = require('vhost');
app.use(vhost('example1.org', app));
// Start up the Node server
app.listen(4100, () => {
console.log(`Node server listening on 4100`);
});
And for second application like that:
const app = express();
const vhost = require('vhost');
app.use(vhost('example2.org', app));
// Start up the Node server
app.listen(4100, () => {
console.log(`Node server listening on 4100`);
});
But when I'm trying to run second instance I'm getting EADDRINUSE ::: 4100, so vhost doesn't work here.
Do you know how to fix it ?
You can only have one process listen to one port, not just in Node.js, but generally (with exceptions that don't apply here).
You can achieve what you need to one of two ways:
Combine the node apps
You could make the apps into one application, listen once and then forward requests for each host to separate bits of code - if you wanted to achieve code separation still, the separate bits of code could be NPM modules that are actually written and maintained in isolation.
Use webserver to proxy the requests
You could run the 2 node processes on some free port, say 5000 and 5001, and use a webserver to forward requests to it automatically based on host. I'd recommend Nginx for this, as its proxying capabilities are both relatively easy to set up, and powerful. It's also fairly good at not using too many system resources. Apache and others can also be used for this, but my personal preference would be Nginx.
Conclusion
My recommendation would be that you install a webserver and forward requests on the exposed port to the separately running node processes. I'd actually recommend that you run node behind a proxy as default for a project, and only expose it directly in excpetional circumstances. You get a lot of configuration options, security, and scalability benefits if your app already involves a well hardened server setup.

How to separate express server code from Express business logic code?

All the Node.js tutorials that I have followed have put everything in one file. It includes importing of libraries, routing, database connecting and starting of the server, by say, express.js:
var app = require('express');
app.get('/somePath', blah blah);
app.listen(...);
Now, I have 4 node servers behind an Nginx load balancer. It then becomes very difficult to have the source code updated on all the four servers.
Is there a way to keep the source code out of the server creation code in such a way that I can deploy the source code on the servers as one package? The server creation code should not know anything about routing or database connections. It should only be listening to changes in a folder and the moment a new module meta file appears, it starts hosting that web application.
Much like how we deploy a Java code packaged as war by Maven and deployed to the webapp of Tomcat, because Tomcat instantiation is not part of the source code. In node.js it seems server is also part of the source code.
For now, the packaging is not my concern. My concern is how to separate the logic and how do I point all my servers to one source code base?
Node.js or JavaScript for that matter doesn't have a concept like WAR. But what it does have is something similar. To achieve something WAR like, you would essentially bundle the code into one source file using something like webpack. However, this will probably not work with Node.js modules like http (Express uses `http since it likely calls or relies on native V8/C++ functions/libraries.
You could also use Docker and think of the Docker containers as WARs.
Here is what I figured out as a work around:
Keep the servers under a folder say, "server_clusters" and put different node servers there, namely: node1.js, node2.js, node3.js, node4.js, etc (I know, in the real world, the clusters would be different VMs or CPUs altogether but for now, I simply want to separate server creation logic from source code). These files would have this code snippet:
var constants = require('./prop');
var appBasePath = constants.APP_BASE_DIR;
var appFilePath = appBasePath + "/main";
var app = require(appFilePath);
//each server would have just different port number while everything else would remain constant
app.listen(8080, function (req, res) {
console.log("server started up");
});
Create a properties file that would have the path to the source code and export the object. That simple. This is what is used on line#1 in the above code
Create the source directory project wherever you want on the machine and just update its home directory in the constant file above. The source code directory can export one landing file that will provide the express app to the servers to start:
var express = require('express');
var app = express();
module.exports = app;
With this, there are multiple servers that are pointing to the same source code.
Hope this helps to those who are facing the same problem.
Other approaches are welcome.

Node.js + Socket.IO scaling with redis + cluster

Currently, I'm faced with the task where I must scale a Node.js app using Amazon EC2. From what I understand, the way to do this is to have each child server use all available processes using cluster, and have sticky connections to ensure that every user connecting to the server is "remembered" as to what worker they're data is currently on from previous sessions.
After doing this, the next best move from what I know is to deploy as many servers as needed, and use nginx to load balance between all of them, again using sticky connections to know which "child" server that each users data is on.
So when a user connects to the server, is this what happens?
Client connection -> Find/Choose server -> Find/Choose process -> Socket.IO handshake/connection etc.
If not, please allow me to better understand this load balancing task. I also do not understand the importance of redis in this situation.
Below is the code I'm using to use all CPU's on one machine for a seperate Node.js process:
var express = require('express');
cluster = require('cluster'),
net = require('net'),
sio = require('socket.io'),
sio_redis = require('socket.io-redis');
var port = 3502,
num_processes = require('os').cpus().length;
if (cluster.isMaster) {
// This stores our workers. We need to keep them to be able to reference
// them based on source IP address. It's also useful for auto-restart,
// for example.
var workers = [];
// Helper function for spawning worker at index 'i'.
var spawn = function(i) {
workers[i] = cluster.fork();
// Optional: Restart worker on exit
workers[i].on('exit', function(worker, code, signal) {
console.log('respawning worker', i);
spawn(i);
});
};
// Spawn workers.
for (var i = 0; i < num_processes; i++) {
spawn(i);
}
// Helper function for getting a worker index based on IP address.
// This is a hot path so it should be really fast. The way it works
// is by converting the IP address to a number by removing the dots,
// then compressing it to the number of slots we have.
//
// Compared against "real" hashing (from the sticky-session code) and
// "real" IP number conversion, this function is on par in terms of
// worker index distribution only much faster.
var worker_index = function(ip, len) {
var s = '';
for (var i = 0, _len = ip.length; i < _len; i++) {
if (ip[i] !== '.') {
s += ip[i];
}
}
return Number(s) % len;
};
// Create the outside facing server listening on our port.
var server = net.createServer({ pauseOnConnect: true }, function(connection) {
// We received a connection and need to pass it to the appropriate
// worker. Get the worker for this connection's source IP and pass
// it the connection.
var worker = workers[worker_index(connection.remoteAddress, num_processes)];
worker.send('sticky-session:connection', connection);
}).listen(port);
} else {
// Note we don't use a port here because the master listens on it for us.
var app = new express();
// Here you might use middleware, attach routes, etc.
// Don't expose our internal server to the outside.
var server = app.listen(0, 'localhost'),
io = sio(server);
// Tell Socket.IO to use the redis adapter. By default, the redis
// server is assumed to be on localhost:6379. You don't have to
// specify them explicitly unless you want to change them.
io.adapter(sio_redis({ host: 'localhost', port: 6379 }));
// Here you might use Socket.IO middleware for authorization etc.
console.log("Listening");
// Listen to messages sent from the master. Ignore everything else.
process.on('message', function(message, connection) {
if (message !== 'sticky-session:connection') {
return;
}
// Emulate a connection event on the server by emitting the
// event with the connection the master sent us.
server.emit('connection', connection);
connection.resume();
});
}
I believe your general understanding is correct, although I'd like to make a few comments:
Load balancing
You're correct that one way to do load balancing is having nginx load balance between the different instances, and inside each instance have cluster balance between the worker processes it creates. However, that's just one way, and not necessarily always the best one.
Between instances
For one, if you're using AWS anyway, you might want to consider using ELB. It was designed specifically for load balancing EC2 instances, and it makes the problem of configuring load balancing between instances trivial. It also provides a lot of useful features, and (with Auto Scaling) can make scaling extremely dynamic without requiring any effort on your part.
One feature ELB has, which is particularly pertinent to your question, is that it supports sticky sessions out of the box - just a matter of marking a checkbox.
However, I have to add a major caveat, which is that ELB can break socket.io in bizarre ways. If you just use long polling you should be fine (assuming sticky sessions are enabled), but getting actual websockets working is somewhere between extremely frustrating and impossible.
Between processes
While there are a lot of alternatives to using cluster, both within Node and without, I tend to agree cluster itself is usually perfectly fine.
However, one case where it does not work is when you want sticky sessions behind a load balancer, as you apparently do here.
First off, it should be made explicit that the only reason you even need sticky sessions in the first place is because socket.io relies on session data stored in-memory between requests to work (during the handshake for websockets, or basically throughout for long polling). In general, relying on data stored this way should be avoided as much as possible, for a variety of reasons, but with socket.io you don't really have a choice.
Now, this doesn't seem too bad, since cluster can support sticky sessions, using the sticky-session module mentioned in socket.io's documentation, or the snippet you seem to be using.
The thing is, since these sticky sessions are based on the client's IP, they won't work behind a load balancer, be it nginx, ELB, or anything else, since all that's visible inside the instance at that point is the load balancer's IP. The remoteAddress your code tries to hash isn't actually the client's address at all.
That is, when your Node code tries to act as a load balancer between processes, the IP it tries to use will just always be the IP of the other load balancer, that balances between instances. Therefore, all requests will end up at the same process, defeating cluster's whole purpose.
You can see the details of this issue, and a couple of potential ways to solve it (none of which particularly pretty), in this question.
The importance of Redis
As I mentioned earlier, once you have multiple instances/processes receiving requests from your users, in-memory storage of session data is no longer sufficient. Sticky sessions are one way to go, although other, arguably better solutions exist, among them central session storage, which Redis can provide. See this post for a pretty comprehensive review of the subject.
Seeing as your question is about socket.io, though, I'll assume you probably meant Redis's specific importance for websockets, so:
When you have multiple socket.io servers (instances/processes), a given user will be connected to only one such server at any given time. However, any of the servers may, at any time, wish to emit a message to a given user, or even a broadcast to all users, regardless of which server they're currently under.
To that end, socket.io supports "Adapters", of which Redis is one, that allow the different socket.io servers to communicate among themselves. When one server emits a message, it goes into Redis, and then all servers see it (Pub/Sub) and can send it to their users, making sure the message will reach its target.
This, again, is explained in socket.io's documentation regarding multiple nodes, and perhaps even better in this Stack Overflow answer.

Can I define Express routes in a child process?

So I run a bunch of a little chatbots written in node, nothing too exciting. However, I recently decided to give them their own little web page to display information in a graphical manner. To do this, I figured I'd just run express.
However, I'm running my bots with a wrapper file that starts each chatbot as a child process. Which makes using express a little tricky. Currently I'm starting the express server in the wrapper.js file like so:
var express = require("express");
var web = express();
web.listen(3001);
And then in the child processes, I'm doing this:
var express = require("express");
var web = express();
web.get("/urlforbot",function (req,res) {
res.send("Working!");
});
However, when I navigate to :3001/urlforbot, I get Cannot GET /urlforbot.
Any idea what I'm doing wrong and how to fix this?
Edit: This is my complete wrapper file: http://snippi.com/s/3vn56m2
Edit 2: This is what I'm doing now. I'm hosting each bot on it's own port, and storing that information in the configs. This is the code I'm using, and it appears to be working:
web.get("/"+cfg.route, function (req,res) { // forward the data
res.redirect('http://url.com:'+cfg.port+"/"+cfg.route);
});
Since your bots run as separate processes (any particular reason?), you have to treat each one as having to implement their own HTTP server with Express:
var express = require("express");
var web = express();
web.get("/urlforbot",function (req,res) {
res.send("Working!");
});
web.listen(UNIQUE_PORT_NUMBER);
Each bot process needs to listen on a unique port number, it can't be shared.
Next, you need to map requests coming in on port 3001 in the 'master' process to the correct child process' Express server.
node-http-proxy has a useful option called a ProxyTable with which to create such a mapping, but it requires the master process to know what the endpoint (/urlforbot in your terms) for each bot is. It also requires that the master knows on which port the bots are listening.
EDIT: alternatively, you can use child_process.fork to fork a new process for each of your bots, and communicate between them and the master process (port numbers and such, or even all the data required to generate the /urlforbot pages) using the comm channel that Node provides, but that still sounds like an overly complex setup.
Wouldn't it be possible to create a Bot class instead? You'd instantiate the class for each bot you want to run, and that instance loads its specific configuration and adds its routes to the Express server. All from the same process.

Resources