Load balancing with nginx using hash method

Load balancing with nginx using hash method - node.js

I want to use nginx as a load balancer in front of several node.js application nodes.
round-robin and ip_hash methods are unbelievably easy to implement but in my use case, they're not the best fit.
I need nginx to serve clients to backend nodes in respect to their session id's which are given by first-landed node.
During my googlings, I've come up with "hash"ing method but I couldn't find too many resources around.
Here is what I tried:
my_site.conf:
http {
upstream my_servers {
hash $remote_addr$http_session_id consistent;
server 127.0.0.1:3000;
server 127.0.0.1:3001;
server 127.0.0.1:3002;
}
server {
listen 1234;
server_name example.com;
location / {
proxy_pass http://my_servers;
proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
proxy_redirect off;
proxy_buffering off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
}
And at the application, I return Session-ID header with the session id.
res.setHeader('Session-ID', req.sessionID);
I'm missing something, but what?

$http_session_id refers to header sent by client (browser), not your application response. And what you need is http://nginx.org/r/sticky, but it's in commercial subscription only.
There is third-party module that will do the same as commercial one, but you'll have to recompile nginx.

It doesn't work out of the box because nginx is a (good) webserver, but not a real load-balancer.
Prefer haproxy for load-balancing.
Furthermore, what you need is not hashing. You need persistence on a session-id header and you need to be able to persist on source IP until you get this header.
This is pretty straight forward with HAProxy. HAProxy can also be used to check if the session id has been generated by the server or if it has been forged by the client.
backend myapp
# create a stick table in memory
# (note: use peers to synchronize the content of the table)
stick-table type string len 32 expire 1h size 1m
# match client http-session-id in the table to do the persistence
stick match hdr(http-session-id)
# if not found, then use the source IP address
stick on src,lower # dirty trick to turn the IP address into a string
# learn the http-session-id that has been generated by the server
stick store-response hdr(http-session-id)
# add a header if the http-session-id seems to be forged (not found in the table)
# (note: only available in 1.6-dev)
acl has-session-id req.hdr(http-session-id) -m found
acl unknown-session-id req.hdr(http-session-id),in_table(myapp)
http-request set-header X-warning unknown\ session-id if has-session-id unknown-session-id
Then you are fully secured :)
Baptiste

Related

Websocket connection re-tries and disconnects so soon when using 4 instance of server and polling as transport type

We are using nodejs + socketIO with transport type as polling as we have to pass token in headers so that we can authenticate the client so i cannot avoid polling transport type.
Now we are using nginx and 4 socket application for this.
I am getting two problem because of this.
When polling call finishes and upgrade to websocket transport type I am getting 400 bad request. That i got to know is because the second request is landing on other socket server which is rejecting this transport type websocket.
these connection are getting triggers to rapidly even once the websocket connection is successful.
This problem#2 comes only when we are running multiple instance of socket server. with single server its works fine and connection doent terminates

When using NGINX as a load balancer to implement reverse proxy to a multi-istance websocket application you have to configure Nginx so that each time a connection is made to an istance, all consecutive requests from the same client should be proxied to the same istance, to avoid unwanted disconnections. Basically you want to implement sticky sessions.
This is well documented in the Socket.io official documentation.
http {
server {
listen 3000;
server_name io.yourhost.com;
location / {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_pass http://nodes;
# enable WebSockets
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
upstream nodes {
# enable sticky session with either "hash" (uses the complete IP address)
hash $remote_addr consistent;
# or "ip_hash" (uses the first three octets of the client IPv4 address, or the entire IPv6 address)
# ip_hash;
# or "sticky" (needs commercial subscription)
# sticky cookie srv_id expires=1h domain=.example.com path=/;
server app01:3000;
server app02:3000;
server app03:3000;
}
}
The key line is hash $remote_addr consistent;, declared inside the upstream block.
Note that here there are 3 different socket istances deployed on hosts app01, app02, and app03 (always port 3000). If you want to run all of your istances on the same host, you should run them on different ports (example: app01:3001, app02:3002, app03:3003).
Moreover, note that if you have multiple socket server istances with several clients connected, you want that client1 connected to ServerA should be able to "see" and communicate with client2 connected to ServerB. To do this, you want ServerA and ServerB to communicate or at least to share informations. Socket.io can handle this for you with a small effort by using a Redis istance and the redis-adapter module. Check this part of the socket.io documentation.
Final note: both links I shared are from the same socket.io doc page but they point to a different section of the page. I strongly suggest you to read the whole page to have a complete overview about the whole architecture.

How to increase Jetty's header buffer size in the Spark UI reverse proxy

I'm getting "HTTP ERROR 502 Bad Gateway" when I click on a worker link in my standalone Spark UI. Looking at the master logs I can see a corresponding message...
HttpSenderOverHTTP.java:219 Generated headers (4096 bytes), chunk (-1 bytes), content (0 bytes) - HEADER_OVERFLOW/HttpGenerator#231f022d{s=START}
The network infrastructure in front of my Spark UI does indeed generate a header that is bigger than 4096 bytes, and the Spark reverse proxy is attempting to pass that to the worker UI. If I bypass that infrastructure the UI works as it should.
After digging into the Spark UI code I believe that the requestBufferSize init parameter of the Jetty ProxyServlet controls this.
Can this be increased at run-time via (say) a Java property? For example, something like...
SPARK_MASTER_OPTS=-Dorg.eclipse.jetty.proxy.ProxyServlet.requestBufferSize=8192 ...
I've tried the above without success -- I'm not familiar enough with Jetty or Servlets in general to know if that's even close to valid. Obviously I'm also looking into ways of reducing the header size but that involves systems that I have much less control over.
(Spark v3.0.2 / Jetty 9.4)

Here's the workaround that I was forced to use -- Putting a proxy in front of the Spark UI that strips the headers.
I used NGINX with this in the default.conf...
server {
listen 8080;
location / {
proxy_pass http://my-spark-master:8080/;
proxy_pass_request_headers off;
proxy_redirect off;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}

I have been fighting this 502 issue for some time now and indeed it seems to be caused by large headers from upstream proxy. I solved it by removing headers that aren't required anyways. Reviewed in browser, then remove using:
proxy_set_header Accept-Encoding "";
As an example.
Thanks for the great tip!
Paul

Scaling Socket.io Node.js App using Cloud Foundry and NginX Build Pack

I am trying to scale my Socket.io Node.js server horizontally using Cloud Foundry (on IBM Cloud).
As of now, my manifest.yml for cf looks like this:
applications:
- name: chat-app-server
memory: 512M
instances: 2
buildpacks:
- nginx_buildpack
This way the deployment goes through, but of course the socket connections between client and server fail because the connection is not sticky.
The official Socket.io documentation gives an example for using NginX for using multiple nodes.
When using a custom nginx.conf file using the Socket.io template I am missing some information (highlighted with ???).
events { worker_connections 1024; }
http {
server {
listen {{port}};
server_name ???;
location / {
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_pass http://nodes;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
upstream nodes {
# enable sticky session based on IP
ip_hash;
server ???:???;
server ???:???;
}
}
I've tried to find out where cloud foundry runs the two instances specified in the manifest.yml file with no luck.
How do I get the required server addresses/ports from cloud foundry?
Is there a way to obtain this information dynamically from CF?
I am deploying my application using cf push.

I haven't used Socket.IO before, so I may be off base, but from a quick read of the docs, it seems like things should just work.
Two points from the docs:
a.) When using WebSockets, this is a non-issue. Cloud Foundry fully supports WebSockets. Hopefully, most of your clients can do that.
b.) When falling back to long polling, you need sticky sessions. Cloud Foundry supports sticky sessions out-of-the-box, so again, this should just work. There is one caveat though regarding CF's support of sticky sessions, it expects the session cookie name to be JSESSIONID.
Again, I'm not super familiar with Socket.IO, but I suspect it's probably using a different session cookie name by default (most things outside of Java do). You just need to change the session cookie name to JSESSIONID and sticky sessions should work.
TIP: you can check the session cookie name by looking at your cookies in your browser's dev tools.
Final note. You don't need Nginx here at all. Gorouter, which is Cloud Foundry's routing layer, will handle the sticky session support for you.

How do I get my React app to point to my Node/Express API in production. Proxy only works in dev.

I'm a bit new to node/react.
I have an API/express node app and in that app I have a react app. The react app has axios.get commands and other API calls. The react app finds the API calls I do and forwards them to the proxy I setup in the package.json of the react app. In dev the proxy looked like this: "proxy": "http://localhost:3003/" but now that I'm going into production I'm trying to change this proxy to be the URL I'm hosting my node express app in "proxy": "http://168.235.83.194:83/"
When I moved my project to production I made port 83 the API node app and I made port 84 the react app (with nginx). For whatever reason though, my react app just doesn't know how to do the API requests to the node app.. I'm getting blank data
After googling I come to realize, the 'proxy' setting only applies to requests made to the development server. Normally in production you have a server that gives the initial page html and also serves api requests. So requests to /api/foo naturally work; you don't need to specify a host.
This is the part I'm trying to figure out. If someone can tell me how to setup my app so that /api/foo naturally works that would be greatly appreciated.
I took a stab at trying to set that up properly. This is probably a complete failure in terms of an approach but it's late and I'm gonna fall asleep on this problem.. I'm supposed to have nginx handle serving both static html and requests in one statement file? I have this so far but I can be way off here...
server {
listen 84;
server_name 168.235.83.194;
root /home/el8le/workspace/notes/client/build;
index index.html index.htm;
location / {
}
location /api{
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-NginX-Proxy true;
proxy_pass http://168.235.83.194:83/; //I have nginx hosting my API app on this port. Not even sure if this should be like this?
proxy_ssl_session_reuse off;
proxy_set_header Host $http_host;
proxy_cache_bypass $http_upgrade;
proxy_redirect off;
}
}`
Also, I'm actually hosting on those ip addresses if you want to get a better sense of where I am at:
http://168.235.83.194:84
http://168.235.83.194:83/customers

You will have to supply the actual API URL while making data request. Dev server is able to proxy to a different API URL. So, if the app loads at http://localhost:83 using DEV Server, any data request like /api/customers will go to http://localhost:83/api/customers and dev proxy server will pipe it to http://localhost:84/api/customers.
But in production, when you make the same request it will use base address of your app and try to get the data from http://PRODUCTION_SERVER:83/api/customers.
Correct way to handle this would be to use absolute URL instead of relative URL. And as production and development will have different Base URLs, maintain them in a config variable and then append specific api address to this base address, something like : ${BASE_URL}/api/customers, where BASE_URL will be http://localhost:84 in DEV and http://PRODUCTION_SERVER:84.

HTTP2 with node.js behind nginx proxy

I have a node.js server running behind an nginx proxy. node.js is running an HTTP 1.1 (no SSL) server on port 3000. Both are running on the same server.
I recently set up nginx to use HTTP2 with SSL (h2). It seems that HTTP2 is indeed enabled and working.
However, I want to know whether the fact that the proxy connection (nginx <--> node.js) is using HTTP 1.1 affects performance. That is, am I missing the HTTP2 benefits in terms of speed because my internal connection is HTTP 1.1?

In general, the biggest immediate benefit of HTTP/2 is the speed increase offered by multiplexing for the browser connections which are often hampered by high latency (i.e. slow round trip speed). These also reduce the need (and expense) of multiple connections which is a work around to try to achieve similar performance benefits in HTTP/1.1.
For internal connections (e.g. between webserver acting as a reverse proxy and back end app servers) the latency is typically very, very, low so the speed benefits of HTTP/2 are negligible. Additionally each app server will typically already be a separate connection so again no gains here.
So you will get most of your performance benefit from just supporting HTTP/2 at the edge. This is a fairly common set up - similar to the way HTTPS is often terminated on the reverse proxy/load balancer rather than going all the way through.
However there are potential benefits to supporting HTTP/2 all the way through. For example it could allow server push all the way from the application. Also potential benefits from reduced packet size for that last hop due to the binary nature of HTTP/2 and header compression. Though, like latency, bandwidth is typically less of an issue for internal connections so importance of this is arguable. Finally some argue that a reverse proxy does less work connecting a HTTP/2 connect to a HTTP/2 connection than it would to a HTTP/1.1 connection as no need to convert one protocol to the other, though I'm sceptical if that's even noticeable since they are separate connections (unless it's acting simply as a TCP pass through proxy). So, to me, the main reason for end to end HTTP/2 is to allow end to end Server Push, but even that is probably better handled with HTTP Link Headers and 103-Early Hints due to the complications in managing push across multiple connections and I'm not aware of any HTTP proxy server that would support this (few enough support HTTP/2 at backend never mind chaining HTTP/2 connections like this) so you'd need a layer-4 load balancer forwarding TCP packers rather than chaining HTTP requests - which brings other complications.
For now, while servers are still adding support and server push usage is low (and still being experimented on to define best practice), I would recommend only to have HTTP/2 at the end point. Nginx also doesn't, at the time of writing, support HTTP/2 for ProxyPass connections (though Apache does), and has no plans to add this, and they make an interesting point about whether a single HTTP/2 connection might introduce slowness (emphasis mine):
Is HTTP/2 proxy support planned for the near future?
Short answer:
No, there are no plans.
Long answer:
There is almost no sense to implement it, as the main HTTP/2 benefit
is that it allows multiplexing many requests within a single
connection, thus [almost] removing the limit on number of
simalteneous requests - and there is no such limit when talking to
your own backends. Moreover, things may even become worse when using
HTTP/2 to backends, due to single TCP connection being used instead
of multiple ones.
On the other hand, implementing HTTP/2 protocol and request
multiplexing within a single connection in the upstream module will
require major changes to the upstream module.
Due to the above, there are no plans to implement HTTP/2 support in
the upstream module, at least in the foreseeable future. If you
still think that talking to backends via HTTP/2 is something needed -
feel free to provide patches.
Finally, it should also be noted that, while browsers require HTTPS for HTTP/2 (h2), most servers don't and so could support this final hop over HTTP (h2c). So there would be no need for end to end encryption if that is not present on the Node part (as it often isn't). Though, depending where the backend server sits in relation to the front end server, using HTTPS even for this connection is perhaps something that should be considered if traffic will be travelling across an unsecured network (e.g. CDN to origin server across the internet).
EDIT AUGUST 2021
HTTP/1.1 being text-based rather than binary does make it vulnerable to various request smuggling attacks. In Defcon 2021 PortSwigger demonstrated a number of real-life attacks, mostly related to issues when downgrading front end HTTP/2 requests to back end HTTP/1.1 requests. These could probably mostly be avoided by speaking HTTP/2 all the way through, but given current support of front end servers and CDNs to speak HTTP/2 to backend, and backends to support HTTP/2 it seems it’ll take a long time for this to be common, and front end HTTP/2 servers ensuring these attacks aren’t exploitable seems like the more realistic solution.

NGINX now supports HTTP2/Push for proxy_pass and it's awesome...
Here I am pushing favicon.ico, minified.css, minified.js, register.svg, purchase_litecoin.svg from my static subdomain too. It took me some time to realize I can push from a subdomain.
location / {
http2_push_preload on;
add_header Link "<//static.yourdomain.io/css/minified.css>; as=style; rel=preload";
add_header Link "<//static.yourdomain.io/js/minified.js>; as=script; rel=preload";
add_header Link "<//static.yourdomain.io/favicon.ico>; as=image; rel=preload";
add_header Link "<//static.yourdomain.io/images/register.svg>; as=image; rel=preload";
add_header Link "<//static.yourdomain.io/images/purchase_litecoin.svg>; as=image; rel=preload";
proxy_hide_header X-Frame-Options;
proxy_http_version 1.1;
proxy_redirect off;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass http://app_service;
}

In case someone is looking for a solution on this when it is not convenient to make your services HTTP2 compatible. Here is the basic NGINX configuration you can use to convert HTTP1 service into HTTP2 service.
server {
listen [::]:443 ssl http2;
listen 443 ssl http2;
server_name localhost;
ssl on;
ssl_certificate /Users/xxx/ssl/myssl.crt;
ssl_certificate_key /Users/xxx/ssl/myssl.key;
location / {
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
}
}

NGINX does not support HTTP/2 as a client. As they're running on the same server and there is no latency or limited bandwidth I don't think it would make a huge different either way. I would make sure you are using keepalives between nginx and node.js.
https://www.nginx.com/blog/tuning-nginx/#keepalive

You are not losing performance in general, because nginx matches the request multiplexing the browser does over HTTP/2 by creating multiple simultaneous requests to your node backend. (One of the major performance improvements of HTTP/2 is allowing the browser to do multiple simultaneous requests over the same connection, whereas in HTTP 1.1 only one simultaneous request per connection is possible. And the browsers limit the number of connections, too. )

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string