I'm running a dedicated proxy server with Squid, and I'm trying to get a feel for the maximum number of connections that the server can handle. I've realized this comes down to available file descriptors on the Linux machine.
I've found plenty of information on increasing maximum file descriptors, but I'd like to find out the theoretical maximum. According to the StackOverflow question "Why do operating systems limit file descriptors?", it comes down to available system RAM, which makes plenty of sense.
Now, given how much RAM I have available, how can I determine a maximum value for file descriptors for the operating system? Some value which would obviously still allow the system to run stably.
Perhaps someone might have an idea given other high-end production servers? What is the 'norm' for maxing out the potential number of simultaneous connections (file descriptors)? Any insight into how I can max-out file descriptors for a Linux system would be greatly appreciated.
You have many limits.
Multiplexing. This shouldn't be an issue if your application uses a decent backend. Libev claims to multiplex with 350us latency at 100,000 file descriptors.
Application speed. A 1ms application latency at that scale (pretty low) per request would take almost two minutes to serve 100,000 requests in optimum conditions.
Bandwidth. Depending on your application and protocol efficiency, this may be a problem. You say it's a squid proxy... if you're proxying websites: a client with no cache requesting a website can receive anywhere from a few hundred KB to several MB. If your average full page request per client was 500KB, you'd max out a full gigabit connection at 2000 requests per second. This might be your limiting factor.
2000 file descriptors is a fairly small amount. I've seen simple apps in languages like Python scale to over 3000 active connections on a single processor core without bad latency.
You can test your squid proxy with software like apachebench running on multiple client computers to get some realistic numbers. It's pretty easy to crank your file descriptor limit up to 2000+ and see what happens, and whether it even makes a difference afterwards.
Related
Hey guys I am having a big problem and i need some advice .
I have a Dedicated server with those informations :
Atom C2750 8/8t 2,4 / 2,6 GHz
16 GB RAM DDR3 1600MHz
12TB
500Mbps Bandwidth
List item
1Gbps Network Burst
I am running a website using Nodejs where users can download high volume files .
The website evolved rapidly and I am having 10K users per day and an average of 1K concurrent users (downloads).
The problem is the server is getting lower and lower download speed on client's side so I have added a throttle to the downloads to 800Kb/s , it did help a bit but the problem remains the same what should I ?
Thanks
You have to figure out where your worst bottleneck is. You will have to design some measurements and tests and maybe some calculations to help determine where the bottleneck is.
Here are some possibilities:
Total server bandwidth over your server ethernet connection to the Internet. If you have 1K users all trying to download something and you have 500Mbps total bandwidth, then you're only going to get 0.5Mbps or 500Kbps per user. At that point, you need to either shrink the data, reduce the number of users or increase your server bandwidth.
Server CPU. This should be easy to detect. Check your CPU utilization. If your node.js process is using 100% of a CPU, then you are CPU bound and you either need a faster computer or if you have a multi-CPU processor, you can cluster your server on the same host to get more CPUs working for you. If you don't have a multi-CPU process, then get one or cluster across multiple servers (the idea is you need more CPUs). Though I have no idea if you are CPU bound (I suspect you're more likely bandwidth bound), this Atom C2750 has 8 cores so it would be good for clustering, but each core is not particularly fast compared to other Intel CPUs.
Network card. It's possible that your network card could be holding you back and not fully saturating your bandwidth. For example, if you only had a 100Mbps network connection to your server, then that's the max bandwidth you can use. If you think you should have a 1Gbps network connection to your server, then you need to make sure you actually are getting that fast a link.
FYI, the 1Gbps Network Burst probably doesn't help you much if you have lots of users downloading stuff over a longer period of time. That is most useful for a sudden and short peak of activity, not for a continuous high load.
This may be beyond the scope of what Node.js can control, but any pointers to other solutions to this problem are appreciated as well.
I'm uploading large files to a Node.js service via HTTP POST. When this runs locally or on the LAN, the chunks the server receives are always 65536 bytes in size and uploads are fast.
When I upload the same file to the same code running on a remote server (a Google Cloud VM, or real hardware at a co-lo) the received chunks are between 1448-2869 bytes and uploads are much slower (far below the bandwidth limits of the connection).
I'm not sure where the decision is being made to send smaller chunks across the WAN connection, if it's a calculation performed by the client software (both curl and node.js clients produce the same result), or if it's routing hardware in-between slicing up the packets, or something completely different.
I'm wondering if there is anything I can do to force larger chunks through to the server, or perhaps an alternative approach that overcomes the server thrash associated with processing these small chunks?
After a lot of experimentation and discussing this with a network engineer, the heart of the issue is the limits imposed by MTU size when the connection goes beyond a direct Ethernet link.
My solution was to modify the application to use multiple simultaneous requests to essentially "fill the gaps" created by the network overhead of the smaller chunks. The result was substantially faster network transfers at the cost of a little increased complexity.
First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
QUESTION ONE
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
QUESTION TWO
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
QUESTION THREE
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
I have other questions about how databases play into the equation, but I think this is enough for now...
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
Few facts.
node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
My suggestions.
use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
restart instances often, at least weekly
write poller that prints out memory stats with process module one per minute
Use supervisord and fabric for easy process management
Monitor cpu, reported memory stats and restart on threshold
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
How can you estimate the amount of users you expect?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
Even if I know the number of users, how can I estimate actual load?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
So the formula to work out daily server load is simple:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
And...
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.
With a comet server running on node.js - how many simultaneous connections could we expect to get out of an EC2 server?
Anyone done this before and found a reasonable limit?
Our particular application only needs to push data to the clients fairly infrequently, it's more the max simultaneous connections per server that is a worry for us. We're looking at somewhere between 200k - 500k i think, and i'm trying to figure out if comet is going to be workable without a monstrous fleet of servers...
If you are running Linux, get to know the contents of /proc/sys/net/ipv4
In particular, net.ipv4.netfilter.ip_conntrack_max will let you increase the maximum number of open connections, but when you start plugging in really big numbers you will run into other problems. For instance you might need to reduce orphan_retries because you will statistically be more likely to have orphans. And with really big numbers, it is entirely possible that kernel lookup algorithms will slow down significantly. You need to carefully tune the TCP settings.
If I were in your shoes, I would compare at least two OSes, such as Linux and FreeBSD or OpenSolaris/Illumos.
On FreeBSD you will need to change settings in /boot/loader.conf
On OpenSolaris/Illumos you will need to read the documentation for the ndd command.
How does one determine the best number of maxSpare, minSpare and maxThreads, acceptCount etc in Tomcat? Are there existing best practices?
I do understand this needs to be based on hardware (e.g. per core) and can only be a basis for further performance testing and optimization on specific hardware.
the "how many threads problem" is quite a big and complicated issue, and cannot be answered with a simple rule of thumb.
Considering how many cores you have is useful for multi threaded applications that tend to consume a lot of CPU, like number crunching and the like. This is rarely the case for a web-app, which is usually hogged not by CPU but by other factors.
One common limitation is lag between you and other external systems, most notably your DB. Each time a request arrive, it will probably query the database a number of times, which means streaming some bytes over a JDBC connection, then waiting for those bytes to arrive to the database (even is it's on localhost there is still a small lag), then waiting for the DB to consider our request, then wait for the database to process it (the database itself will be waiting for the disk to seek to a certain region) etc...
During all this time, the thread is idle, so another thread could easily use that CPU resources to do something useful. It's quite common to see 40% to 80% of time spent in waiting on DB response.
The same happens also on the other side of the connection. While a thread of yours is writing its output to the browser, the speed of the CLIENT connection may keep your thread idle waiting for the browser to ack that a certain packet has been received. (This was quite an issue some years ago, recent kernels and JVMs use larger buffers to prevent your threads for idling that way, however a reverse proxy in front of you web application server, even simply an httpd, can be really useful to avoid people with bad internet connection to act as DDOS attacks :) )
Considering these factors, the number of threads should be usually much more than the cores you have. Even on a simple dual or quad core server, you should configure a few dozens threads at least.
So, what is limiting the number of threads you can configure?
First of all, each thread (used to) consume a lot of resources. Each thread have a stack, which consumes RAM. Moreover, each Thread will actually allocate stuff on the heap to do its work, consuming again RAM, and the act of switching between threads (context switching) is quite heavy for the JVM/OS kernel.
This makes it hard to run a server with thousands of threads "smoothly".
Given this picture, there are a number of techniques (mostly: try, fail, tune, try again) to determine more or less how many threads you app will need:
1) Try to understand where your threads spend time. There are a number of good tools, but even jvisualvm profiler can be a great tool, or a tracing aspect that produces summary timing stats. The more time they spend waiting for something external, the more you can spawn more threads to use CPU during idle times.
2) Determine your RAM usage. Given that the JVM will use a certain amount of memory (most notably the permgen space, usually up to a hundred megabytes, again jvisualvm will tell) independently of how many threads you use, try running with one thread and then with ten and then with one hundred, while stressing the app with jmeter or whatever, and see how heap usage will grow. That can pose a hard limit.
3) Try to determine a target. Each user request needs a thread to be handled. If your average response time is 200ms per "get" (it would be better not to consider loading of images, CSS and other static resources), then each thread is able to serve 4/5 pages per second. If each user is expected to "click" each 3/4 seconds (depends, is it a browser game or a site with a lot of long texts?), then one thread will "serve 20 concurrent users", whatever it means. If in the peak hour you have 500 single users hitting your site in 1 minute, then you need enough threads to handle that.
4) Crash test the high limit. Use jmeter, configure a server with a lot of threads on a spare virtual machine, and see how response time will get worse when you go over a certain limit. More than hardware, the thread implementation of the underlying OS is important here, but no matter what it will hit a point where the CPU spend more time trying to figure out which thread to run than actually running it, and that numer is not so incredibly high.
5) Consider how threads will impact other components. Each thread will probably use one (or maybe more than one) connection to the database, is the database able to handle 50/100/500 concurrent connections? Even if you are using a sharded cluster of nosql servers, does the server farm offer enough bandwidth between those machines? What else will run on the same machine with the web-app server? Anache httpd? squid? the database itself? a local caching proxy to the database like mongos or memcached?
I've seen systems in production with only 4 threads + 4 spare threads, cause the work done by that server was merely to resize images, so it was nearly 100% CPU intensive, and others configured on more or less the same hardware with a couple of hundreds threads, cause the webapp was doing a lot of SOAP calls to external systems and spending most of its time waiting for answers.
Oce you've determined the approx. minimum and maximum threads optimal for you webapp, then I usually configure it this way :
1) Based on the constraints on RAM, other external resources and experiments on context switching, there is an absolute maximum which must not be reached. So, use maxThreads to limit it to about half or 3/4 of that number.
2) If the application is reasonably fast (for example, it exposes REST web services that usually send a response is a few milliseconds), then you can configure a large acceptCount, up to the same number of maxThreads. If you have a load balancer in front of your web application server, set a small acceptCount, it's better for the load balancer to see unaccepted requests and switch to another server than putting users on hold on an already busy one.
3) Since starting a thread is (still) considered a heavy operation, use minSpareThreads to have a few threads ready when peak hours arrive. This again depends on the kind of load you are expecting. It's even reasonable to have minSpareThreads, maxSpareThreads and maxThreads setup so that an exact number of threads is always ready, never reclaimed, and performances are predictable. If you are running tomcat on a dedicated machine, you can raise minSpareThreads and maxSpareThreads without any danger of hogging other processes, otherwise tune them down cause threads are resources shared with the rest of the processes running on most OS.