I work for a site that often get's attacked by bot networks. We have started to use this tool: http://deflate.medialayer.com/ which auto-bans ip's that have more open connections than the set value. By default it's set to 150, we are currently using 250.
I would like to know, how low is safe so that search bots and normal visitors do not get blocked?
I don't exactly know, but I have added a WARN_LIMIT feature to the ddos.sh script so that you can set a threshold that will not get banned but you will still get warned. So you can run this script for a while with conservative limits and then apply stricter limits after you get an idea about the real-world usage.
https://github.com/colinmollenhour/ddos-deflate
Modern browsers may open up to 250 connections in total (Firefox on Windows is limited to 48 in FF 8, and by default somewhere between 4 and 16 for a single server.
In Firefox the setting is named Network.http.max-connections-per-server and defaults to 8. AFAIK Chrome has a default of 6. However, because of delays on connection timeouts the number of open connections that DDOS Deflate gets from netstat might be higher, maybe up to 30-40.
So from what I've read from various search results like this one on Lighttpd is that 100 should be a safe number that won't ban regurlar users.
Related
I have a working prototype of a concurrent Scala program using Actors. I am now trying to fine tune the number of different Actors, etc..
One stage of the processing requires fetching new data via the internet. Of course, there is nothing I can really do to speed that aspect up. However, I figure if I launch a bunch of requests in parallel, I can bring down the total time. The question, therefore, is:
=> Is there a limit on concurrent networking in Scala or on Unix systems (such as max num sockets)? If so, how can I find out what it is.
In Linux, there is a limit on the number of open file descriptors each program can have open. This can be seen using the ulimit -n. There is a system-wide limit in /proc/sys/kernel/file-max.
Another limit is the number of connections that the Linux firewall can track. If you are using the iptables connection tracking firewall this value is in /proc/sys/net/netfilter/nf_conntrack_max.
Another limit is of course TCP/IP itself. You can only have 65534 connections to the same remote host and port because each connection needs a unique combination of (localIP, localPort, remoteIP, remotePort).
Regarding speeding things up via concurrent connections: it isn't as easy as just using more connections.
It depends on where the bottlenecks are. If your local connection is being fully used, adding more connections will only slow things down. If you are connecting to the same remote server and its connection is fully used, more will only slow it down.
Where you can get a benefit is when your local connection is not fully used and you are connecting to multiple remote hosts.
If you look at web browsers, you will see they have limits on how many connections will be made to the same remote server. They also have limits on how many connections will be made in total.
I'm running a dedicated proxy server with Squid, and I'm trying to get a feel for the maximum number of connections that the server can handle. I've realized this comes down to available file descriptors on the Linux machine.
I've found plenty of information on increasing maximum file descriptors, but I'd like to find out the theoretical maximum. According to the StackOverflow question "Why do operating systems limit file descriptors?", it comes down to available system RAM, which makes plenty of sense.
Now, given how much RAM I have available, how can I determine a maximum value for file descriptors for the operating system? Some value which would obviously still allow the system to run stably.
Perhaps someone might have an idea given other high-end production servers? What is the 'norm' for maxing out the potential number of simultaneous connections (file descriptors)? Any insight into how I can max-out file descriptors for a Linux system would be greatly appreciated.
You have many limits.
Multiplexing. This shouldn't be an issue if your application uses a decent backend. Libev claims to multiplex with 350us latency at 100,000 file descriptors.
Application speed. A 1ms application latency at that scale (pretty low) per request would take almost two minutes to serve 100,000 requests in optimum conditions.
Bandwidth. Depending on your application and protocol efficiency, this may be a problem. You say it's a squid proxy... if you're proxying websites: a client with no cache requesting a website can receive anywhere from a few hundred KB to several MB. If your average full page request per client was 500KB, you'd max out a full gigabit connection at 2000 requests per second. This might be your limiting factor.
2000 file descriptors is a fairly small amount. I've seen simple apps in languages like Python scale to over 3000 active connections on a single processor core without bad latency.
You can test your squid proxy with software like apachebench running on multiple client computers to get some realistic numbers. It's pretty easy to crank your file descriptor limit up to 2000+ and see what happens, and whether it even makes a difference afterwards.
With a comet server running on node.js - how many simultaneous connections could we expect to get out of an EC2 server?
Anyone done this before and found a reasonable limit?
Our particular application only needs to push data to the clients fairly infrequently, it's more the max simultaneous connections per server that is a worry for us. We're looking at somewhere between 200k - 500k i think, and i'm trying to figure out if comet is going to be workable without a monstrous fleet of servers...
If you are running Linux, get to know the contents of /proc/sys/net/ipv4
In particular, net.ipv4.netfilter.ip_conntrack_max will let you increase the maximum number of open connections, but when you start plugging in really big numbers you will run into other problems. For instance you might need to reduce orphan_retries because you will statistically be more likely to have orphans. And with really big numbers, it is entirely possible that kernel lookup algorithms will slow down significantly. You need to carefully tune the TCP settings.
If I were in your shoes, I would compare at least two OSes, such as Linux and FreeBSD or OpenSolaris/Illumos.
On FreeBSD you will need to change settings in /boot/loader.conf
On OpenSolaris/Illumos you will need to read the documentation for the ndd command.
What solutions do you have in place for handling bandwidth billing for your vhosts on a shared environment in apache? If you are using log parsing, does your solution scale well when the logs become very very large? Anyone using any sort of module out there for this?
There exist certain modules for Apache 1.x and 2.x that will allow you to set a maximum on the transfer amount, most of them keep track using the scoreboard file that Apache generates (when mod_status is enabled with ExtendedStatus on). The one I still have bookmarked from when I was looking for one is mod_curb, however it is not complete and at the current moment in time looks to only work on a server-wide scale and not for individual virtual hosts.
Apache modules can be set to be outbound filters, so you could write a costume module that would sit at the end of the chain, and add up all the outgoing packets, using the data that APR provides you can then add it to a counter for that specific domain/sub-domain. After that you have a choice of what to do with the data.
For specific examples, take a look at mod_deflate that Apache provides, to see how it sits at the end of the chain and compresses everything but the headers the server sends out. This should give you a good start.
As for log based processing, it becomes slower the more logs exist. This is just the nature of the beast. When we were using a log based solution we had a custom perl script that ran every 15 minutes. Eventually it would take longer than 15 minutes to parse, and since we had proper locking after a while multiple of these log processing perl scripts were now running, all waiting on each other. We ended up re-writing it with a simple call to tail -F, which then let perl parse each and every request as it came in, while not entirely efficient, it worked. The upside of that was that we were now able to update traffic statistics in near realtime so that clients were updated sooner rather than later if they went over their limits.
You could go the poor man's route, and use Webalizer or Awstats. Both of these will give you an idea of traffic based off of access logs, and can be done on a per virtual host basis. In the case of Awstats, I know once you start doing 10GB+ of traffic daily, it starts to consume resources. You can always nice it, but then you'll get your data next week, rather than when you actually need it. In the past with Webalizer I've had to use some hackery to get it to handle large access logs, by chunking up the logs to smaller pieces that it could manage. It didn't provide as many useful metrics from what I've done with it, but I've also never needed to save a server from it :)
If virtual host does not have own IP, there is no easier way than logfile parsing. Just use mod_logio to calculate actual bytes transferred. mod_logio handles broken connections, compressed data etc. correctly. You should be able to parse logs realtime using piped logs. Use BufferedLogs to scale further (just check that parser handles lines broken when buffered correctly). Parser should save data periodically (like every minute) somewhere, just avoid locking issues as parsing must not slow down httpd. If httpd connections is spending time in L-state at server-status, you are too slow. After you have numbers, you can sum then further and then save data to billing system.
If you save billing logs as file too you can correct and doublecheck realtime traffic calculations. If you boot httpd you can end up missing some lines. But generally losing couple hundred requests is acceptable as it less than seconds worth on a high volume site.
There is modules that try to handle and limit bandwidth, like mod_cband and mod_bw. But they don't work when you have same vhost on multiple machines. I guess they would work ok on smaller scale.
If you have IP per vhost you could try IP based methods like feeding firewall logs to traffic calculator. Simple way is to use iptables.
Although we use IIS rather than apache we do use log file analysis for bandwidth billing (and bandwidth profiling / analysis). We use a custom application to load data collected in the log files in one hour increments, and act upon any required notifications or bandwidth overuse.
The log file loader runs as a low priority process, so as not to interupt operation of the server. Even on high usage servers with a large number of sites, processing takes less than 15 minutes, so we don't see scalability as a problem with this methodology.
There may be better ways of doing this, but this is perfectly adequate for what we need. I look forward to viewing the other responses.
It can be easily achieved with mod_cband. We've rewritten the module to fix a few bugs, provide true redundancy on restarts and incorporate FTP and Mail statistics.
http://www.howtoforge.com/mod_cband_apache2_bandwidth_quota_throttling
Well mod_cband would be great, except for when i'm using it, the max_connections (the overall, total value for every client combined), decides to crawl upwards until it hits the max value i've set. when it does reach the highest value, it just stays there and leaves all my clients receiving a constant "503 Service Temporarily Unavailable" error.
for example, i set "CbandSpeed 1000Mbps 500 1200", and the server connections crawls up to 1200 in about 8 hrs, then stays there. at this point, i count the total number of connections under Remote Clients in the mod_cband status window, and i see around 50. i've also used ps aux and i see around the same amount (~50) open http processes, which is normal, except for the fact that nobody can access the site at all because of the 503 errors.
Any ideas what could be wrong, or can this be fixed?
We have a dedicated godaddy server and it seemed to grind to a halt when we had users downloading only 3MB every 2 seconds (this was over about 20 http requests).
I want to look into database locking etc. to see if that is a problem - but first I'm curious as to what a dedicated server ought to be able to serve.
to help diagnose the problem, host a large file and download it. That will give you the transfer that the server and your web server can cope with. If the transfer rate is poor, then you know its the network, server or webserver.
If its acceptable or good, then you know its the means you have of generating those 3MB files.
check, measure and calculate!
PS. download the file over a fast link, you don't want the bottleneck to be your 64kbps modem :)
A lot depends on what the 3MB is. Serving up 1.5MBps of static data is way, way, way, within the bounds of even the weakest server.
Perhaps godaddy does bandwidt throtling? 60MB downloads every 2 seconds might fire some sort of bandwidt protection (either to protect their service or you from being overcharged, or both).
Check netspeed.stanford.edu from the dedicated server and see what your inbound and outbound traffic is like.
Also make sure your ISP is not limiting you at 10MBps (godaddy by default limits to 10Mbps and will set it at 100Mbps on request)