Multiple nodejs workers in docker - node.js

I'm very new to docker and productionizing nodejs web apps. However, after some reading I've determined that a good setup would be:
nginx container serving static files, ssl, proxying nodejs requests
nodejs container
postgesql container
However, I'm now trying to tackle scalability. Seeing as you can define multiple proxy_pass statements in an nginx config, could you not spin up a duplicate nodejs container (exactly the same but exposing a different port) and effectively "load balance" your web app? Is it a good architecture?
Also, how would this effect database writes? Are there race conditions I need to specifically architecture for? Any guidance would be appreciated.

Yes, it's possible to use Nginx to load balance requests between different instances of your Node.js services. Each Node.js instance could be running in a different Docker container. Increasing the scalability of your configuration is as easy as starting up another Docker container and ensure it's registered in the Nginx config. (Depending on how often you have to update the Nginx config, a variety of tools/frameworks are available to do this last step automatically.)
For example, below is an Nginx configuration to load balance incoming requests across different Node.js services. In our case, we have multiple Node.js services running on the same machine, but it's perfectly possible to use Docker containers instead.
File /etc/nginx/sites-enabled/apps:
upstream apps-cluster {
least_conn;
server localhost:8081;
server localhost:8082;
server localhost:8083;
keepalive 512;
}
server {
listen 8080;
location "/" {
proxy_next_upstream error timeout http_500 http_502 http_503 http_504;
proxy_set_header Connection "";
proxy_http_version 1.1;
proxy_pass http://apps-cluster;
}
access_log off;
}
Despite running multiple instances of your Node.js services, your database should not be negatively affected. The PostgreSQL database itself can perfectly handle multiple open connections and automatically resolves any race conditions. From a developer point of view, the code for running 1 Node.js service is the same as for running x Node.js services.

You can set "Function Level Concurrent Execution Limit" on the function you are using to connect to RDS. This will contain the number of RDS connections. The requests from Dynamo will be throttled though.
Another option is to stream them into Kinesis or SQS from this lambda and have another worker lambda to read it from there and pump the data into RDS. This is scalable and reliable with no throttling.

Related

Next.js + API on same server

Looking at the following scenario, I want to know if this can be considered a good practice in terms of architecture:
A React application that uses NextJS framework to render on the server. As the data of the application changes often, it uses Server-side Rendering ("SSR" or "Dynamic Rendering"). In terms of code, it fetches data on the getServerSideProps() function. Meaning it will be executed by the server on every request.
In the same server instance, there is an API application running in parallel (it can be a NodeJS API, Python Flask app, ...). This app is responsible to query a database, prepare models and apply any transformation to data. The API can be accessed from the outside.
My question is: How can NextJS communicate to the API app internally? Is a correct approach for it to send requests via a localhost port? If not, is there an alternative that doesn't imply NextJS to send external HTTP requests back to same server itself?
One of the key requirements is that each app must remain independent. They are currently running on the same server, but they come from different code-repositories and each has its own SDLC and release process. They have their own domain (URL) and in the future they might live on different server instances.
I know that in NextJS you can use libraries such as Prisma to query a database. But that is not an option. Data modeling is managed by the API and I want to avoid duplication of efforts. Also, once the NextJS is rendered on the client side, React will continue calling the API app via normal HTTP requests. That is for keeping a dynamic front-end experience.
This is very common scenario when frontend application running independent from backend. Reverse proxy usually help us.
Here are simple way I would like to suggest you to use to achieve(and also this is one of the best way)
Use different port for your frontend and backend application
All api should start with specific url like /api and your frontend
route must not start with /api
Use a web server( I mostly use Nginx that help
me in case of virtual host, reverse proxy, load balancing and also
server static content)
So in your Nginx config file, add/update following location/route
## for backend or api and assuming backend is running on 3000 port on same server
location /api {
proxy_pass http://localhost:3000;
## Following lines required if you are using WebSocket,
## I usually add even not using WebSocket
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
## for frontend and assuming frontend is running on 5000 port on same server
location / {
proxy_pass http://localhost:5000;
}

Scaling Socket.io Node.js App using Cloud Foundry and NginX Build Pack

I am trying to scale my Socket.io Node.js server horizontally using Cloud Foundry (on IBM Cloud).
As of now, my manifest.yml for cf looks like this:
applications:
- name: chat-app-server
memory: 512M
instances: 2
buildpacks:
- nginx_buildpack
This way the deployment goes through, but of course the socket connections between client and server fail because the connection is not sticky.
The official Socket.io documentation gives an example for using NginX for using multiple nodes.
When using a custom nginx.conf file using the Socket.io template I am missing some information (highlighted with ???).
events { worker_connections 1024; }
http {
server {
listen {{port}};
server_name ???;
location / {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_pass http://nodes;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
upstream nodes {
# enable sticky session based on IP
ip_hash;
server ???:???;
server ???:???;
}
}
I've tried to find out where cloud foundry runs the two instances specified in the manifest.yml file with no luck.
How do I get the required server addresses/ports from cloud foundry?
Is there a way to obtain this information dynamically from CF?
I am deploying my application using cf push.
I haven't used Socket.IO before, so I may be off base, but from a quick read of the docs, it seems like things should just work.
Two points from the docs:
a.) When using WebSockets, this is a non-issue. Cloud Foundry fully supports WebSockets. Hopefully, most of your clients can do that.
b.) When falling back to long polling, you need sticky sessions. Cloud Foundry supports sticky sessions out-of-the-box, so again, this should just work. There is one caveat though regarding CF's support of sticky sessions, it expects the session cookie name to be JSESSIONID.
Again, I'm not super familiar with Socket.IO, but I suspect it's probably using a different session cookie name by default (most things outside of Java do). You just need to change the session cookie name to JSESSIONID and sticky sessions should work.
TIP: you can check the session cookie name by looking at your cookies in your browser's dev tools.
Final note. You don't need Nginx here at all. Gorouter, which is Cloud Foundry's routing layer, will handle the sticky session support for you.

Do I need a different server to run node.js

sorry if this is a wrong question on this forum but I am simply just stuck and need some advice. I have a shared hosting service and a cloud based hosting server with node.js installed. I want to host my website as normal but I also want to add real time chat and location tracking using node.js I am confused with what I am reading in several places because node.js is itself a server but not designed to host websites? So I have to run 2 different servers? One for the website and one to run node.js? When I setup the cloud one with a node.js script running I can no longer access the webpages.
Whats the best way for me achieve this as I am just going round in circles. Also is there a way I can set up a server on my PC and run and test both of these together before hand so I see what is needed and get it working as it will stop me ordering servers I dont need.
Many thanks for any help or advice.
Node can serve webpages using a framework like Express, but can cause conflicts if run on the same port as another webserver program (Apache, etc). One solution could be to serve your webpages through your webserver on port 80 (or 443 for HTTPS) and run your node server on a different port in order to send information back and forth.
There are a number of ways you can achieve this but here is one popular approach.
You can use NGINX as your front facing web server and proxy the requests to your backend Node service.
In NGINX, for example, you will configure your upstream service as follows:
upstream lucyservice {
server 127.0.0.1:8000;
keepalive 64;
}
The 8000 you see above is just an example, you may be running your Node service on a different port.
Further in your config (in the server config section) you will proxy the requests to your service as follows:
location / {
proxy_pass http://lucyservice;
}
You're Node service can be running in a process manager like forever / pm2 etc. You can have multiple Node services running in a cluster depending on how many processors your machine has etc.
So to recap - your front facing web server will be handling all traffic on port 80 (HTTP) and or 443 (HTTPS) and this will proxy the requests to your Node service running on whatever port(s) you define. All of this can happen on one single server or multiple if you need / desire.

Running multiple instances of nodejs server for scaling

I am running a nodejs server on port 8080, so my server can only process one request at a time.
I can see that if i send multiple requests in one single shot, new requests are queued and executed sequentially one after another.
What I am trying to find is, how do i run multiple instances/threads of this process. Example like gunicorn for python servers. Is there something similar, instead of running the nodejs server on multiple ports for each instance.
I have placed nginx infront of the node process. Is that sufficient and recommended method.
worker_processes auto;
worker_rlimit_nofile 1100;
events {
worker_connections 1024;
multi_accept on;
use epoll;
}
pid /var/run/nginx.pid;
http {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
server {
listen 80;
server_name localhost;
access_log /dev/null;
error_log /dev/null;
location / {
proxy_pass http://localhost:8080;
}
}
}
First off, make sure your node.js process is ONLY using asynchronous I/O. If it's not compute intensive and using asynchronous I/O, it should be able to have many different requests "in-flight" at the same time. The design of node.js is particularly good at this if your code is designed properly. If you show us the crux of what it is doing on one of these requests, we can advise more specifically on whether your server code is designed properly for best throughput.
Second, instrument and measure, measure, measure. Understand where your bottlenecks are in your existing node.js server and what is causing the delay or sequencing you see. Sometimes there are ways to dramatically fix/improve your bottlenecks before you start adding lots more clusters or servers.
Third, use the node.js cluster module. This will create one master node.js process that automatically balances between several child processes. You generally want to creates a cluster child for each actual CPU you have in your server computer since that will get you the most use out of your CPU.
Fourth, if you need to scale to the point of multiple actual server computers, then you would use either a load balancer or reverse proxy such as nginx to share the load among multiple hosts. If you had a quad core CPUs in your server, you could run a cluster with four node.js processes on it on each server computer and then use nginx to balance among the several server boxes you had.
Note that adding multiple hosts that are load balanced by nginx is the last option here, not the first option.
Like #poke said, you would use a reverse proxy and/or a load balancer in front.
But if you want a software to run multiple instances of node, with balancing and other stuffs, you should check pm2
http://pm2.keymetrics.io/
Just a point to be added here over #sheplu, the pm2 module uses the node cluster module under the hood. But even then, pm2 is a very good choice, as it provides various other abstractions other than node cluster.
More info on it here: https://pm2.keymetrics.io/docs/usage/pm2-doc-single-page/

How does nginx load balances node instances

I have 3 cpu cores on my machine, and I have 3 instances of node running, one for each core. When I access such server directly, that's a master process that always gets called. However, when I use a reversed nginx proxy, the process is random. Where does nginx chooses which node process to run?
http://domain.com:1000 -> proxy
http://domain.com:2000 -> node processes
Nginx config:
server {
listen 1000;
server_name node;
location / {
proxy_pass http://domain.com:2000/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
}
Nginx has no knowledge of your back-end's clustering the way you've configured it. As noted in a comment against your original question, Nginx load balancing only applies if you configure it with multiple backends. You are not doing this.
Therefore, if your backend itself is something like NodeJS using the Cluster module, the load balancing is happening there. Node doesn't need any "help" from Nginx to do this - it has its own mechanisms. They're described in https://nodejs.org/api/cluster.html#cluster_how_it_works:
The cluster module supports two methods of distributing incoming connections.
The first one (and the default one on all platforms except Windows), is the round-robin approach, where the master process listens on a port, accepts new connections and distributes them across the workers in a round-robin fashion, with some built-in smarts to avoid overloading a worker process.
The second approach is where the master process creates the listen socket and sends it to interested workers. The workers then accept incoming connections directly.
You can therefore choose how you want this to work in your Node back-end. At the moment, you are telling Nginx to always go to port 2000, therefore whatever you have running there is getting the traffic. If that is the master process for a Node Cluster, it will round-robin load balance. If that is one of the child workers, then that child worker will get ALL the traffic and the others will get none.

Resources