NodeJS load test poor performance (EADDRNOTAVAIL) - node.js

I'm beginning with web applications using NodeJS and there is one problem with my app I don't know how to solve.
the application (we use expressjs) is running smooth on my local machine but, when we deploy it to our dev server for load test, we're getting an error like this
Error: connect EADDRNOTAVAIL
at errnoException (net.js:770:11)
at connect (net.js:646:19)
at Socket.connect (net.js:711:9)
at asyncCallback (dns.js:68:16)
at Object.onanswer [as oncomplete] (dns.js:121:9)
GET XXXXXXX 500 21ms
Our application does not have a database, it deals with a Rest API backend. Every page we build needs one or more calls to our backend. I know we must use a caching system but we want to test without it.
Our load test simulates user navigation. It starts with 5 users and it adds another user every minute. When we have over 25 users, we begin to see the error in our logs.
At the beginning I thought it could be a problem regarding too many open connections, but our sys admins says that's not the case.
So, it would be great if anyone could give a hint about where should I look at.
EDIT: Our dev machine has 16 cores and we're running our application using cluster module. Calls to backend are handled with popular Mikael's request module.

As robertklep suggested, this is a problem of the SO running out virtual ports when opening too many outgoing connections. Follow his link for a detailed explanation.
When I increased the ports as the article says I still got the problem. With a some more googling I found out about problems with garbage collector and node network objects. It seems a good idea (when you need many many outgoing connections) to manually handle garbage collector.
Check out this post.

If you make sure it is not a program problem, you can change the linux system configure to solve this problem:
[xxx#xxx~]$vim /etc/sysctl.conf
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
[xxx#xxx~]$sysctl -p

Related

Node app keeps crashing due to exhausted memory after migrated to Windows environment

I am working on a React site that was originally built by someone else. The app uses the Wordpress rest api for handling content.
Currently the live app sits on a nginx server running node v6 and it has been working just fine. However now I have to move the app over to an IIS environment(not by choice) and have been having nothing but problems with it. I have got the app to finally run as expected which is great, but now I am running into an issue regarding the memory in node becoming exhausted.
So when I was debugging this issue I noticed the server's firewall was polling the home route every 5 - 10 seconds, which was firing an api request to the Wordpress api each time. The api then would return a pretty large JSON object of data.
So my conclusion to this was the firewall is polling the home route too often which was killing the memory because then the app had to constantly fire api request and load in huge sets of data over and over.
So my solution was to set up a polling route on the node server(express) which would just return a 200 response and nothing else. This seemed to fix the issue as the app went from crashing every hours to lasting over two days. However after about two days the app crashed again with another memory error. The error looked like this:
So since the app lasted much longer with the polling route added in I assume that firewall polling was/is in fact my issue here, however now that I added in the polling route and the app still crashed after a couple days I have no idea what to do which is why I am asking for help.
I am very unfamiliar with working on Windows so I don't know if there are any memory restrictions or any obvious things I could do to help prevent this issue.
Some other notes are: I have tried increasing the --max-old-space-size to about 8000 but it didn't seem to do anything so I don't know if I am maybe implementing it wrong but when I start the script I have tried the following commands when starting the app:
Start-process npm -Argumentlist “run server-prod --max-old-space-size=8192” -WorkingDirectory C:\node\prod
And when I used forever to handle the process
forever start -o out.log -e error.log .\lib\server\server.js -c "node --max_old_space_size=8000”
Any help on what could be the issue or tips on what I should look for woulf be great, again I am very new to working on Windows so maybe there is just something I am missing.

Is there a hard limit on socket.io connections?

Background
We have a server that has socket.io 2.0.4. This server receives petitions from a stress script that simulates clients using socket.io-client 2.0.4.
The scrip simulates the creation of clients ( each client with its own socket ) that sends a petition and immediately dies after, using socket.disconnect();
Problem
During the first few of seconds all goes well. But every test reaches a point in which the script starts spitting out the following error:
connect_error: Error: websocket error
This means that the clients my script is creating are not connecting to the server because they are unable to connect.
This script creates 7 clients per second ( spaced evenly throughout the second ), each client makes 1 petition and then dies.
Research
At first I thought there was an issue with file descriptors and limits imposed by UNIX, since the server is in a Debian machine:
https://github.com/socketio/socket.io/issues/1393
After following these suggestions, the issue remained however.
Then I though maybe my test script was not connecting correctly, so I changed the connection options as in this discussion:
https://github.com/socketio/socket.io-client/issues/1097
Still, to no avail.
What could be wrong?
I see the machine's CPU's are constantly at 100% so I guess I am pounding the server with requests.
But if I am not mistaken, the server should simply accept more requests and process them when possible.
Questions
Is there a limit to the amount of connections a socket.io server can handle?
When making such stress tests one needs to be aware of protections and gate keepers.
In our case, our stack was deployed in AWS. So first, the AWS load balancers started blocking us because they thought the system was being DDOSed.
Then, the Debian system was getting flooded and it started refusing connections with SYN_FLOOD.
But after fixing that we were still having the error. Turns out we had to increase TCP connection's buffer and how TCP connections were being handled in the kernel.
Now it accepts all connections, but I wish no one the suffering we went through to find it out...

How to direct a user to an available websocket server when she logs in to my multi-server Node.js app?

This is more like a design question but I have no idea where to start.
Suppose I have a realtime Node.js app that runs on multiple servers. When a user logs in she doesn't know which server she will be assigned to. She will just login, do something and logout and that's it. A user won't be interacting with other users on a different server, nor will her details be stored on another server.
In the backend I assume the Node.js server will put the user's login details to some queue and then when there is space it will assign this user to an available server (A server that has the lowest ping value or is not full). Because there is a limit number of users on one physical server when the users try to login to a "full" server it will direct her to another available server.
I am using ws module of node.js. Is there any service available for this purpose or do I have to build my own? How difficult would that be?
I am not sure how websocket fits into this question. Ignoring it. I guess your actual question is about load balancing... Let me try paraphasing it.
Q: Does NodeJS has any load balancing feature that I can leverage?
Yes and it is called cluster in NodeJS. Instead of the traditional one node process listening on a single port, this module allows you to spawn a group of node processes and have them all binded to the same port.
This means is that all the user know is only the service's endpoint. He sends a request to it and 1 of the available server in the group will serve him whenever possible.
Alternatively using Nginx, the web server, as your load balancer is also a very popular approach to this problem.
References:
Cluster API: https://nodejs.org/api/cluster.html
Nginx as load balancer: http://nginx.org/en/docs/http/load_balancing.html
P.S
I guess the key word for googling solutions to your problem is load balancer.
Out of the 2 solutions I would recommend going the Nginx way as it is a much scalable approach
Example:
Your Node process could possibly be spread across multiple hosts (horizontal scaling). The former solution is more for vertical scaling, taking advantages of multi-cores machine.

How to detect and possibly ignore processing a bad/hung client browser request

I'm developing a node web application. And, while testing around, one of the client chrome browser went into hung state. The browser entered into an infinite loop where it was continuously downloading all the JavaScript files referenced by the html page. I rebooted the webserver (node.js), but once the webserver came back online, it continued receiving tons of request per second from the same browser in question.
Obviously, I went ahead and terminated the client browser so that the issue went away.
But, I'm concerned, once my web application go live/public, how to handle such problem-client-connections from the server side. Since I will have no access to the clients.
Is there anything (an npm module/code?), that can make best guess to handle/detect such bad client connections from within my webserver code. And once detected, ignore any future requests from that particular client instance. I understand handling within the Node server might not be the best approach. But, at least I can save my cpu/network by not rendering to the bad requests.
P.S.
Btw, I'm planning to deploy my node web application onto Heroku with a small budget. So, if you know of any firewall/configuration that could handle the above scenario please do recommend.
I think it's important to know that this is a pretty rare case. If your application has a very large user base or there is some other reason you are concerned with DOS/DDOS related attacks, it looks like Heroku provides some DDOS security for you. If you have your own server, I would suggest looking into Nginx or HAProxy as load balancers for your app combined with fail2ban. See this tutorial.

Node.js SSL server frozen, high CPU, not crashed but no connections

I hope anyone could help me with this issue.
In our company we are setting up a node.js server, connected to a Java Push server.
I'm using https module instead of http and SLL certificates.
The connection between node and clients is made by socket.io, in server and client.
At the same time the node.js server is client of the java server, this connection is being made with regular sockets (net.connect).
The idea is that users connect to the server, join some channels, and when some data arrive from java server, this is dispatched to the corresponding users.
Everything seems to work fine, but after a while, like randomly, having like between 450 and 700 users, the server's CPU reaches 100%, all the connections are broken, but the server is not crashed. The thing is that if you go to the https://... in the browser, you are not getting 404 or something like that but SSL connection error, and its really fast.
I tried to add logs everywhere, but there's not something like a pattern, its like random.
If anybody have the same problem or could bring me a clue, or a tip to debug better, I'll appreciate anything.
Thanks a lot.
Okay, the problem is solved. It is a problem that will occur in every Linux server. So, if you are working with one of these, you need to read this.
The reason was the default limit of files the Linux server had per each process.
Seems that ever single linux server comes with this limitation of 1024 files opened by each process, you can check your limit with:
# ulimit -n
To increase this number
# ulimit -n 5000 (for example)
Each socket creates a new virtual file.
For some reason my server was not displaying any error, the server just got frozen, stopping the log and no signal or evidence of anything. It was when I set up a copy of the server in another machine, when it started to send
warn: error raised: Error: accept EMFILE
warn: error raised: Error: accept EMFILE
warn: error raised: Error: accept EMFILE
...
Be careful because if you are not root, you will only change this for the current session and not permanently.
Trick: If you want to cound the number of files, in this case, the number of files opened by your node process, take note of your process id and call this command.
# ls -l /proc/XXXXX/fd | wc -l
Where XXXXX is the process id. This will help you to know if this is your problem, once you launch your node server, you can use this command to check if it reaches a top, and it stops growing after it gets frozen. (by default 1024 or "ulimit -n").
If you only want to check which files are open by the process:
# ls -l /proc/XXXXX/fd
Hope this can help you. Any way if you are setting up a node js server I'm pretty sure you want to do that to be sure it won't melt.
Finally if you need help in future errors without log, you can try to straceing or dtrussing process
# strace -p <process-id>
should do the job.

Resources