nodejs express windows max connections setting - node.js

Where to set max connections for nodejs(for using express' get) in windows 10? Is it related to max files(descriptors) setting in linux? Is there a windows version of that setting? Preferably a setting in nodejs so it will be compatible when migrated to unix?
I suspect loadtest module gives error because of this setting when over 2000 concurrent connections to attack my server program that uses express and keeps connections in a queue to be processed later. Loadtest finishes normally for 200 concurrent connections(-c 200 in command line). Also when I don't keep connections in a queue and if operation in get is simple("response.end('hello world')"), it doesn't give error for -c 2000 maybe it finishes a work before other starts so its not 2000 concurrent really, only queued version has 2000 concurrency?
I'm not using http module but I am handling xlmhttprequest sent from clients on serverside express module's get function.
Maybe desktop version windows os wasn't developed with server apps in mind so it doesn't have a setting for max connections, max file accesses and related variables has been hardcoded in it?

To answer one part of your question, for max open file there is _setmaxstdio, see:
https://msdn.microsoft.com/en-us/library/6e3b887c(vs.71).aspx
Maybe you can write a wrapper that changes this and starts your Node program.
As for the general question of max open connections, this is what I found about some older Windows:
http://smallvoid.com/article/winnt-tcpip-max-limit.html
It talks about changing the config of:
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \Tcpip \Parameters]
TcpNumConnections = 0x00fffffe (Default = 16,777,214)
[HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \Tcpip \Parameters]
MaxUserPort = 5000 (Default = 5000, Max = 65534)
See also other related parameters in the above link. I don't know if the names of the config values changed in more recent versions of Windows but HKEY_LOCAL_MACHINE System CurrentControlSet Services Tcpip Parameters is something that I would search for if increasing the open files limit is not enough.
For a comparison, here is how I increase the number of concurrent Node/Express connection in Linux.

Related

Node.js Server Timeout Problems (EC2 + Express + PM2)

I'm relatively new to running production node.js apps and I've recently been having problems with my server timing out.
Basically after a certain amount of usage & time my node.js app stops responding to requests. I don't even see routes being fired on my console anymore - it's like the whole thing just comes to a halt and the HTTP calls from my client (iPhone running AFNetworking) don't reach the server anymore. But if I restart my node.js app server everything starts working again, until things inevitable stop again. The app never crashes, it just stops responding to requests.
I'm not getting any errors, and I've made sure to handle and log all DB connection errors so I'm not sure where to start. I thought it might have something to do with memory leaks so I installed node-memwatch and set up a listener for memory leaks but that doesn't get called before my server stops responding to requests.
Any clue as to what might be happening and how I can solve this problem?
Here's my stack:
Node.js on AWS EC2 Micro Instance (using Express 4.0 + PM2)
Database on AWS RDS volume running MySQL (using node-mysql)
Sessions stored w/ Redis on same EC2 instance as the node.js app
Clients are iPhones accessing the server via AFNetworking
Once again no errors are firing with any of the modules mentioned above.
First of all you need to be a bit more specific about timeouts.
TCP timeouts: TCP divides a message into packets which are sent one by one. The receiver needs to acknowledge having received the packet. If the receiver does not acknowledge having received the package within certain period of time, a TCP retransmission occurs, which is sending the same packet again. If this happens a couple of more times, the sender gives up and kills the connection.
HTTP timeout: An HTTP client like a browser, or your server while acting as a client (e.g: sending requests to other HTTP servers), can set an arbitrary timeout. If a response is not received within that period of time, it will disconnect and call it a timeout.
Now, there are many, many possible causes for this... from more trivial to less trivial:
Wrong Content-Length calculation: If you send a request with a Content-Length: 20 header, that means "I am going to send you 20 bytes". If you send 19, the other end will wait for the remaining 1. If that takes too long... timeout.
Not enough infrastructure: Maybe you should assign more machines to your application. If (total load / # of CPU cores) is over 1, or your memory usage is high, your system may be over capacity. However keep reading...
Silent exception: An error was thrown but not logged anywhere. The request never finished processing, leading to the next item.
Resource leaks: Every request needs to be handled to completion. If you don't do this, the connection will remain open. In addition, the IncomingMesage object (aka: usually called req in express code) will remain referenced by other objects (e.g: express itself). Each one of those objects can use a lot of memory.
Node event loop starvation: I will get to that at the end.
For memory leaks, the symptoms would be:
the node process would be using an increasing amount of memory.
To make things worse, if available memory is low and your server is misconfigured to use swapping, Linux will start moving memory to disk (swapping), which is very I/O and CPU intensive. Servers should not have swapping enabled.
cat /proc/sys/vm/swappiness
will return you the level of swappiness configured in your system (goes from 0 to 100). You can modify it in a persistent way via /etc/sysctl.conf (requires restart) or in a volatile way using: sysctl vm.swappiness=10
Once you've established you have a memory leak, you need to get a core dump and download it for analysis. A way to do that can be found in this other Stackoverflow response: Tools to analyze core dump from Node.js
For connection leaks (you leaked a connection by not handling a request to completion), you would be having an increasing number of established connections to your server. You can check your established connections with netstat -a -p tcp | grep ESTABLISHED | wc -l can be used to count established connections.
Now, the event loop starvation is the worst problem. If you have short lived code node works very well. But if you do CPU intensive stuff and have a function that keeps the CPU busy for an excessive amount of time... like 50 ms (50 ms of solid, blocking, synchronous CPU time, not asynchronous code taking 50 ms), operations being handled by the event loop such as processing HTTP requests start falling behind and eventually timing out.
The way to find a CPU bottleneck is using a performance profiler. nodegrind/qcachegrind are my preferred profiling tools but others prefer flamegraphs and such. However it can be hard to run a profiler in production. Just take a development server and slam it with requests. aka: a load test. There are many tools for this.
Finally, another way to debug the problem is:
env NODE_DEBUG=tls,net node <...arguments for your app>
node has optional debug statements that are enabled through the NODE_DEBUG environment variable. Setting NODE_DEBUG to tls,net will make node emit debugging information for the tls and net modules... so basically everything being sent or received. If there's a timeout you will see where it's coming from.
Source: Experience of maintaining large deployments of node services for years.

Weird Tomcat outage, possibly related to maxConnections

In my company we experienced a serious problem today: our production server went down. Most people accessing our software via a browser were unable to get a connection, however people who had already been using the software were able to continue using it. Even our hot standby server was unable to communicate with the production server, which it does using HTTP, not even going out to the broader internet. The whole time the server was accessible via ping and ssh, and in fact was quite underloaded - it's normally running at 5% CPU load and it was even lower at this time. We do almost no disk i/o.
A few days after the problem started we have a new variation: port 443 (HTTPS) is responding but port 80 stopped responding. The server load is very low. Immediately after restarting tomcat, port 80 started responding again.
We're using tomcat7, with maxThreads="200", and using maxConnections=10000. We serve all data out of main memory, so each HTTP request completes very quickly, but we have a large number of users doing very simple interactions (this is high school subject selection). But it seems very unlikely we would have 10,000 users all with their browser open on our page at the same time.
My question has several parts:
Is it likely that the "maxConnections" parameter is the cause of our woes?
Is there any reason not to set "maxConnections" to a ridiculously high value e.g. 100,000? (i.e. what's the cost of doing so?)
Does tomcat output a warning message anywhere once it hits the "maxConnections" message? (We didn't notice anything).
Is it possible there's an OS limit we're hitting? We're using CentOS 6.4 (Linux) and "ulimit -f" says "unlimited". (Do firewalls understand the concept of Tcp/Ip connections? Could there be a limit elsewhere?)
What happens when tomcat hits the "maxConnections" limit? Does it try to close down some inactive connections? If not, why not? I don't like the idea that our server can be held to ransom by people having their browsers on it, sending the keep-alive's to keep the connection open.
But the main question is, "How do we fix our server?"
More info as requested by Stefan and Sharpy:
Our clients communicate directly with this server
TCP connections were in some cases immediately refused and in other cases timed out
The problem is evident even connecting my browser to the server within the network, or with the hot standby server - also in the same network - unable to do database replication messages which normally happens over HTTP
IPTables - yes, IPTables6 - I don't think so. Anyway, there's nothing between my browser and the server when I test after noticing the problem.
More info:
It really looked like we had solved the problem when we realised we were using the default Tomcat7 setting of BIO, which has one thread per connection, and we had maxThreads=200. In fact 'netstat -an' showed about 297 connections, which matches 200 + queue of 100. So we changed this to NIO and restarted tomcat. Unfortunately the same problem occurred the following day. It's possible we misconfigured the server.xml.
The server.xml and extract from catalina.out is available here:
https://www.dropbox.com/sh/sxgd0fbzyvuldy7/AACZWoBKXNKfXjsSmkgkVgW_a?dl=0
More info:
I did a load test. I'm able to create 500 connections from my development laptop, and do an HTTP GET 3 times on each, without any problem. Unless my load test is invalid (the Java class is also in the above link).
It's hard to tell for sure without hands-on debugging but one of the first things I would check would be the file descriptor limit (that's ulimit -n). TCP connections consume file descriptors, and depending on which implementation is in use, nio connections that do polling using SelectableChannel may eat several file descriptors per open socket.
To check if this is the cause:
Find Tomcat PIDs using ps
Check the ulimit the process runs with: cat /proc/<PID>/limits | fgrep 'open files'
Check how many descriptors are actually in use: ls /proc/<PID>/fd | wc -l
If the number of used descriptors is significantly lower than the limit, something else is the cause of your problem. But if it is equal or very close to the limit, it's this limit which is causing issues. In this case you should increase the limit in /etc/security/limits.conf for the user with whose account Tomcat is running and restart the process from a newly opened shell, check using /proc/<PID>/limits if the new limit is actually used, and see if Tomcat's behavior is improved.
While I don't have a direct answer to solve your problem, I'd like to offer my methods to find what's wrong.
Intuitively there are 3 assumptions:
If your clients hold their connections and never release, it is quite possible your server hits the max connection limit even there is no communications.
The non-responding state can also be reached via various ways such as bugs in the server-side code.
The hardware conditions should not be ignored.
To locate the cause of this problem, you'd better try to replay the scenario in a testing environment. Perform more comprehensive tests and record more detailed logs, including but not limited:
Unit tests, esp. logic blocks using transactions, threading and synchronizations.
Stress-oriented tests. Try to simulate all the user behaviors you can come up with and their combinations and test them in a massive batch mode. (ref)
More specified Logging. Trace client behaviors and analysis what happened exactly before the server stopped responding.
Replace a server machine and see if it will still happen.
The short answer:
Use the NIO connector instead of the default BIO connector
Set "maxConnections" to something suitable e.g. 10,000
Encourage users to use HTTPS so that intermediate proxy servers can't turn 100 page requests into 100 tcp connections.
Check for threads hanging due to deadlock problems, e.g. with a stack dump (kill -3)
(If applicable and if you're not already doing this, write your client app to use the one connection for multiple page requests).
The long answer:
We were using the BIO connector instead of NIO connector. The difference between the two is that BIO is "one thread per connection" and NIO is "one thread can service many connections". So increasing "maxConnections" was irrelevant if we didn't also increase "maxThreads", which we didn't, because we didn't understand the BIO/NIO difference.
To change it to NIO, put this in the element in server.xml:
protocol="org.apache.coyote.http11.Http11NioProtocol"
From what I've read, there's no benefit to using BIO so I don't know why it's the default. We were only using it because it was the default and we assumed the default settings were reasonable and we didn't want to become experts in tomcat tuning to the extent that we now have.
HOWEVER: Even after making this change, we had a similar occurrence: on the same day, HTTPS became unresponsive even while HTTP was working, and then a little later the opposite occurred. Which was a bit depressing. We checked in 'catalina.out' that in fact the NIO connector was being used, and it was. So we began a long period of analysing 'netstat' and wireshark. We noticed some periods of high spikes in the number of connections - in one case up to 900 connections when the baseline was around 70. These spikes occurred when we synchronised our databases between the main production server and the "appliances" we install at each customer site (schools). The more we did the synchronisation, the more we caused outages, which caused us to do even more synchronisations in a downward spiral.
What seems to be happening is that the NSW Education Department proxy server splits our database synchronisation traffic into multiple connections so that 1000 page requests become 1000 connections, and furthermore they are not closed properly until the TCP 4 minute timeout. The proxy server was only able to do this because we were using HTTP. The reason they do this is presumably load balancing - they thought by splitting the page requests across their 4 servers, they'd get better load balancing. When we switched to HTTPS, they are unable to do this and are forced to use just one connection. So that particular problem is eliminated - we no longer see a burst in the number of connections.
People have suggested increasing "maxThreads". In fact this would have improved things but this is not the 'proper' solution - we had the default of 200, but at any given time, hardly any of these were doing anything, in fact hardly any of these were even allocated to page requests.
I think you need to debug the application using Apache JMeter for number of connection and use Jconsole or Zabbix to look for heap space or thread dump for tomcat server.
Nio Connector of Apache tomcat can have maximum connections of 10000 but I don't think thats a good idea to provide that much connection to one instance of tomcat better way to do this is to run multiple instance of tomcat.
In my view best way for Production server: To Run Apache http server in front and point your tomcat instance to that http server using AJP connector.
Hope this helps.
Are you absolutely sure you're not hitting the maxThreads limit? Have you tried changing it?
These days browsers limit simultaneous connections to a max of 4 per hostname/ip, so if you have 50 simultaneous browsers, you could easily hit that limit. Although hopefully your webapp responds quickly enough to handle this. Long polling has become popular these days (until websockets are more prevalent), so you may have 200 long polls.
Another cause could be if you use HTTP[S] for app-to-app communication (that is, no browser involved). Sometimes app writers are sloppy and create new connections for performing multiple tasks in parallel, causing TCP and HTTP overhead. Double check that you are not getting an inflood of requests. Log files can usually help you on this, or you can use wireshark to count the number of HTTP requests or HTTP[S] connections. If possible, modify your API to handle multiple API calls in one HTTP request.
Related to the last one, if you have many HTTP/1.1 requests going across one connection, and intermediate proxy may be splitting them into multiple connections for load balancing purposes. Sounds crazy I know, but I've seen it happen.
Lastly, some crawl bots ignore the crawl delay set in robots.txt. Again, log files and/or wireshark can help you determine this.
Overall, run more experiments with more changes. maxThreads, https, etc. before jumping to conclusions with maxConnections.

Node.js performance optimization involving HTTP calls

I have a Node.js application which opens a file, scans each line and makes a REST call that involves Couchbase for each line. The average number of lines in a file is about 12 to 13 million. Currently without any special settings my app can completely process ~1 million records in ~24 minutes. I went through a lot of questions, articles, and Node docs but couldn't find out any information about following:
Where's the setting that says node can open X number of http connections / sockets concurrently ? and can I change it?
I had to regulate the file processing because the file reading is much faster than the REST call so after a while there are too many open REST requests and it clogs the system and it goes out of memory... so now I read 1000 lines wait for the REST calls to finish for those and then resume it ( i am doing it using pause and resume methods on stream) Is there a better alternative to this?
What all possible optimizations can I perform so that it becomes faster than this. I know the gc related config that prevents from frequent halts in the app.
Is using "cluster" module recommended? Does it work seamlessly?
Background: We have an existing java application that does exactly same by spawning 100 threads and it is able to achieve slightly better throughput than the current node counterpart. But I want to try node since the two operations in question (reading a file and making a REST call for each line) seem like perfect situation for node app since they both can be async in node where as Java app makes blocking calls for these...
Any help would be greatly appreciated...
Generally you should break your questions on Stack Overflow into pieces. Since your questions are all getting at the same thing, I will answer them. First, let me start with the bottom:
We have an existing java application that does exactly same by spawning 100 threads ... But I want to try node since the two operations in question ... seem like perfect situation for node app since they both can be async in node where as Java app makes blocking calls for these.
Asynchronous calls and blocking calls are just tools to help you control flow and workload. Your Java app is using 100 threads, and therefore has the potential of 100 things at a time. Your Node.js app may have the potential of doing 1,000 things at a time but some operations will be done in JavaScript on a single thread and other IO work will pull from a thread pool. In any case, none of this matters if the backend system you're calling can only handle 20 things at a time. If your system is 100% utilized, changing the way you do your work certainly won't speed it up.
In short, making something asynchronous is not a tool for speed, it is a tool for managing the workload.
Where's the setting that says node can open X number of http connections / sockets concurrently ? and can I change it?
Node.js' HTTP client automatically has an agent, allowing you to utilize keep-alive connections. It also means that you won't flood a single host unless you write code to do so. http.globalAgent.maxSocket=1000 is what you want, as mentioned in the documentation: http://nodejs.org/api/http.html#http_agent_maxsockets
I had to regulate the file processing because the file reading is much faster than the REST call so after a while there are too many open REST requests and it clogs the system and it goes out of memory... so now I read 1000 lines wait for the REST calls to finish for those and then resume it ( i am doing it using pause and resume methods on stream) Is there a better alternative to this?
Don't use .on('data') for your stream, use .on('readable'). Only read from the stream when you're ready. I also suggest using a transform stream to read by lines.
What all possible optimizations can I perform so that it becomes faster than this. I know the gc related config that prevents from frequent halts in the app.
This is impossible to answer without detailed analysis of your code. Read more about Node.js and how its internals work. If you spend some time on this, the optimizations that are right for you will become clear.
Is using "cluster" module recommended? Does it work seamlessly?
This is only needed if you are unable to fully utilize your hardware. It isn't clear what you mean by "seamlessly", but each process is its own process as far as the OS is concerned, so it isn't something I would call "seamless".
By default, node uses a socket pool for all http requests and the default global limit is 5 concurrent connections per host (these are re-used for keepalive connections however). There are a few ways around this limit:
Create your own http.Agent and specify it in your http requests:
var agent = new http.Agent({maxSockets: 1000});
http.request({
// ...
agent: agent
}, function(res) { });
Change the global/default http.Agent limit:
http.globalAgent.maxSockets = 1000;
Disable pooling/connection re-use entirely for a request:
http.request({
// ...
agent: false
}, function(res) { });

how to identify max number of TCP requests limit

I am running redis-benchmark tool to send N number of requests From server A to B.
This tools generates TCP requests and receives response.
Some how when number requests reach to 51000, it stops and not exceeding above that.
I have tried the same using different machine and I got almost 100000 requests proccessed per second.
What sort of factors can limit these number of requests ??
A major factor would be the number of open file descriptors the process is allowed to create. This would be true for both the server and client side.
http://redis.io/topics/clients and http://redis.io/topics/benchmarks both have the information you should work through to determine where exactly your problem is. Without the details of your setup it is unlikely we can be more specific.
Check your ulimits and your server configuration to ensure you've configured your respective systems to the limits you intend to benchmark to and you'll be able to get more usable data.

Node js avoid pyramid of doom and memory increases at the same time

I am writing a socket.io based server and I'm trying to avoid the pyramid of doom and to keep the memory low.
I wrote this client - http://jsfiddle.net/QUDXU/1/ which i run with node client-cluster 1000. So 1000 connections that are making continuous requests.
For the server side a tried 3 different solutions which i tested. The results in terms of RAM used by the server, after i let everything run for an hour are:
Simple callbacks - http://jsfiddle.net/DcWmJ/ - 112MB
Q module - http://jsfiddle.net/hhsja/1/ - 850MB and increasing
Async module - http://jsfiddle.net/SgemT/ - 1.2GB and increasing
The server and clients are on different machines. (Softlayer cloud instances). Node 0.10.12 and Socket.io 0.9.16
Why is this happening? How can I keep the memory low and use some kind of library which allows to keep the code readable?
Option 1. You can use the cluster module and gracefully kill your workers from time to time (make sure you disconnect() first). You can check process.memoryUsage().rss > 130000000 in the master and kill the workers when they exceed 130MB, for example :)
Option 2. NodeJS has the habit of using memory and rarely doing rigorous cleanups. As V8 reaches the maximum memory limit, GC calls are more aggressive. So you could lower the maximum memory a node process can take up by running node --max-stack-size <amount>. I do this when running node on embedded devices (often with less than 64 MB of ram available).
Option 3. If you really want to keep the memory low, use weak references where it is possible (anywhere except in long-running calls) https://github.com/TooTallNate/node-weak . This way, the objects will get garbage collected sooner. Extensive tests to make sure everything works are needed, though. GL if u use this one :) https://github.com/TooTallNate/node-weak
It seems like the problem was on the client script, not on the server one. I ran 1000 processes, each of them emitting messages to the server at every second. I think the server was getting very busy resolving all of those requests and thus using all of that memory. I rewrote the client side like this, spawning a number of processes proportional to the number of processors, each of them connecting multiple times like this:
client = io.connect(selectedEnvironment, { 'force new connection': true, 'reconnect': false });
Notice the 'force new connection' flag that allows to connect multiple clients using the same instance of socket.io-client.
The part that solved my problem was actually how the requests were made: any client would make another request after a second from receiving the acknowledge of the previous request, not at every second.
Connecting 1000 clients is making my server using ~100MB RSS. I also used async on the server script which seems very elegant and easier to understand than Q.
The bad part is that I've been running the server for about 2-3 days and the memory rised at 250MB RSS. This, I don't know why.

Resources